Anthropic's Approach To Code Review: Turning Claude Into A Multi-Agent System For Pull Requests

The current moment in AI can reasonably be described as an LLM war.

The release of ChatGPT by OpenAI in late 2022 turned large language models from research artifacts into widely used tools. Suddenly millions of people were interacting with AI systems daily, and the competition that followed quickly expanded beyond conversational assistants into search, productivity software, and developer tools.

Companies like Anthropic, Google, and Meta began releasing their own models and ecosystems, each attempting to define what the next generation of software interfaces might look like.

Among these competitors, Claude has taken a somewhat different path.

While most early attention around LLMs focused on chat interfaces and prompt-driven workflows, Claude has increasingly been positioned as an embedded collaborator inside complex systems, particularly in software development. Rather than acting only as a conversational assistant, the model is integrated into developer workflows through tools like Claude Code, which operates directly inside terminals, IDEs, and Git-based workflows.

This design reflects a broader shift in how LLMs are being used: not simply answering questions, but participating in the production of software itself.

Introducing Code Review, a new feature for Claude Code.

When a PR opens, Claude dispatches a team of agents to hunt for bugs. pic.twitter.com/AL2J4efxPw
— Claude (@claudeai) March 9, 2026

The introduction of automated code review capabilities builds on that direction. In modern development teams, code review remains one of the most important (and time-consuming) steps in the software lifecycle.

Every pull request must be examined for correctness, security risks, style consistency, and long-term maintainability. As AI-assisted coding accelerates development speed, that review step increasingly becomes the bottleneck. Internal data shared around the feature suggests code output within engineering teams has increased significantly over the past year, while human review capacity has not grown at the same pace.

Claude’s approach to this problem is not simply to generate a single automated review.

Instead, the system distributes the review process across multiple specialized agents. These agents examine the pull request from different perspectives, like checking adherence to project conventions, scanning for potential bugs, examining git history for context, validating code comments, and comparing changes against previous discussions.

Agents search for bugs in parallel, verify each bug to reduce false positives, and rank bugs by severity.

You get one high-signal summary comment plus inline flags.
— Claude (@claudeai) March 9, 2026

Each agent produces findings that are scored on a confidence scale, and only issues above a configurable threshold are surfaced to developers, reducing noise and false positives that commonly affect static analysis tools.

This multi-agent design reflects a broader trend in LLM systems toward decomposition rather than monolithic reasoning.

Instead of relying on a single prompt to perform a complex task, the workflow divides responsibilities across specialized components that can operate in parallel. Research on LLM-based development workflows has shown that such role-based architectures where separate agents explore codebases, design solutions, and validate implementations can significantly improve reliability in large software projects.

Another notable aspect of the system is its emphasis on context.

We've been running this on most PRs at Anthropic. Results after months of testing:

PRs w/ substantive review comments went from 16% → 54%

<1% of review findings are marked incorrect by engineers

On large PRs (1,000+ lines), 84% surface findings, avg 7.5 issues each
— Claude (@claudeai) March 9, 2026

Traditional automated review tools often analyze code diffs in isolation, which limits their ability to understand developer intent.

Modern LLM-based approaches instead attempt to incorporate broader information such as related issues, commit history, and surrounding code structures. Academic work on context-enriched code review benchmarks shows that providing this additional context significantly improves the accuracy of automated review systems and enables more precise line-level feedback.

The feature also integrates with existing development infrastructure rather than requiring teams to adopt new workflows.

For example, reviews can be triggered from the command line, automatically run through repository integrations, or executed through continuous integration systems. When vulnerabilities are detected, such as SQL injection risks, cross-site scripting issues, or insecure data handling, the system can also propose fixes or implement them directly, allowing developers to address security concerns before code reaches production.

Despite the growing capabilities of these systems, automated code review is not positioned as a replacement for human oversight.

Even within AI-assisted environments, the final decision to merge code remains a human responsibility. The goal is instead to shift the role of reviewers toward higher-level reasoning (architecture decisions, design tradeoffs, and system behavior), while routine pattern detection and consistency checks are handled by automated systems.

This shift reflects a deeper transformation underway in software engineering.

Tools like Claude are increasingly capable of generating code, understanding entire repositories, and coordinating multi-file changes across complex projects. In practice this means the software development pipeline itself is becoming partially automated: issues can be translated into code, tests can be generated, pull requests can be analyzed, and fixes can be suggested with minimal manual intervention.

Code Review optimizes for depth and may be more expensive than other solutions, like our open source GitHub Action.

Reviews generally average $15–25, billed on token usage, and they scale based on PR complexity.
— Claude (@claudeai) March 9, 2026

The implications are still unfolding.

Early experiments suggest AI coding assistants can dramatically increase productivity in certain workflows, although they still require careful oversight and structured processes to avoid introducing errors or technical debt. In some reported cases, experienced engineers have used these tools to compress weeks of development work into days, while still needing to guide the system and validate its output.

What is emerging is not a replacement for developers, but a new form of collaboration between humans and machines.

As AI accelerates the pace of software creation, systems like automated code review attempt to rebalance the workflow by increasing the capacity for quality control.

In the context of the broader LLM competition, this illustrates a subtle but important difference in strategy: while many AI products compete on conversational ability, others are focusing on embedding intelligence directly into the infrastructure where work already happens.

Code Review is available now as a research preview in beta for Team and Enterprise.

Read the blog for more: https://t.co/UjqqaSeAre
— Claude (@claudeai) March 9, 2026

Published:

10/03/2026

Dark Mode

Search form

Anthropic's Approach To Code Review: Turning Claude Into A Multi-Agent System For Pull Requests