Background

Perplexity Launches An 'Advanced' Version Of Deep Research, Pushing AI Beyond Answers Into Autonomous Research

Perplexity Deep Research

The AI landscape has evolved into a fierce battle among large language models (LLMs).

What started from the introduction of ChatGPT by OpenAI, sparked an arms race where others like Google, Anthropic, and many others try to out pace one another deliver more powerful, versatile, and agentic systems.

But here's the thing.

What began as simple chatbots has now shifted toward "agentic" AI, which can be described as tools that don't just answer questions but actively research, reason, and produce in-depth outputs autonomously. In this emerging field, Perplexity AI stands out by blending real-time web search with advanced reasoning, positioning itself not as another general-purpose chatbot but as a dedicated answer engine focused on accuracy, citations, and verifiable sources.

With that in mind, Perplexity is launching an Deep Research upgrade.

This 'Advanced' version pushes the boundaries of AI-driven research, achieving state-of-the-art performance on key benchmarks.

It topped the Google DeepMind Deep Search QA leaderboard with 79.5% accuracy, surpassing competitors like Moonshot K2.5 (77.1%), Anthropic's Opus 4.5 (76.1%), OpenAI's GPT-5.2 (71.3%), and Google's Gemini Deep Research Agent (66.1%). It also outperformed OpenAI's o3 and o4-mini models in real-world evaluations.

At its core, Deep Research, originally introduced in February 2025, allows users to generate comprehensive, multi-thousand-word reports on complex topics in just 2-4 minutes.

The tool performs dozens of iterative searches, reads hundreds of sources, refines its plan dynamically, and synthesizes everything into a clear, cited report.

The advanced iteration builds on this by pairing top-tier models (such as upgraded versions of Claude, Gemini, and others) with Perplexity's proprietary search infrastructure, delivering higher accuracy, better completeness, and improved objectivity across domains like finance, law, medicine, technology, and science.

To support these claims and foster industry progress, Perplexity open-sourced the DRACO benchmark (Deep Research Accuracy, Completeness, and Objectivity).

This new evaluation standard uses LLM-as-judge methodology to test AI agents on realistic, production-level research tasks. It's now publicly available, inviting developers and researchers to measure and improve deep research capabilities more rigorously.

Higher usage limits rolled out first to Max subscribers, with Pro users gaining access soon after.

This upgrade reinforces Perplexity's edge in the agentic era, where speed, reliability, and transparency matter most.

While some users have raised concerns about occasional inconsistencies in the past, the latest enhancements, combined with Perplexity's model-agnostic approach (letting subscribers switch between leading LLMs), make it a compelling choice for anyone needing expert-level insights without spending hours manually digging.

As the LLM wars intensify, Perplexity's focus on grounded, iterative research tools like Advanced Deep Research signals a maturing ecosystem: one where AI doesn't just compete on raw intelligence but on delivering trustworthy, actionable knowledge at unprecedented speed.

For researchers, analysts, students, and professionals, this could mark a turning point in how we tackle complex questions.

Still, even with these advances, Deep Research is not without limitations.

Like all LLM-driven systems, it depends on the quality and availability of online information. In domains where reliable sources are scarce, paywalled, region-specific, or highly specialized, outputs can still reflect gaps, outdated data, or overconfident synthesis. Citations improve transparency, but they do not fully eliminate the risk of subtle misinterpretations, weak sources being weighted too heavily, or nuanced expert debates being simplified into seemingly definitive conclusions.

Another challenge lies in evaluation itself.

While DRACO is a promising step toward standardized benchmarking, LLM-as-judge methodologies can introduce their own biases, particularly when systems are evaluated using AI-generated assessments rather than purely human expert review.

Benchmarks also tend to favor tasks that are measurable and structured, which may not fully capture messy, real-world research workflows involving ambiguity, conflicting evidence, or evolving data. High leaderboard scores, therefore, should be seen as indicators of progress, not guarantees of reliability in every scenario.

Published: 
05/02/2026