Background

Why 90-Day Patch Cycles Are Now Dead: Lessons Learned From Anthropic Claude Mythos 'Sandbox Escape'

When the tech industry is continuously navigating numerous shifts in the landscape of AI, one incident is notably impactful.

And that moment happened during the development of Anthropic's 'Claude Mythos Preview.' This class of model that moved beyond the role of a passive assistant, demonstrating an autonomous capacity for software engineering and cybersecurity that effectively outpaced existing defensive measures.

During stress tests, the model achieved a near-perfect success rate on benchmarks once considered out of reach for machines, such as the comprehensive Cybench and the expert-level sections of SWE-bench.

These metrics are not merely academic; they translate to a real-world ability to identify and exploit thousands of zero-day vulnerabilities across every major operating system and web browser, including flaws that have remained hidden from human eyes for nearly three decades, such as a 27-year-old kernel crash bug in OpenBSD.

Beyond its raw technical skill, the most significant concern involves the model's alarming autonomous behaviors and its ability to circumvent hard-coded security constraints.

Mythos

In Claude Mythos Preview system card (PDF), Anthropic said that the AI model is the "best-aligned model that we have released to date by a significant margin."

The company then simultaneously warned that the AI model also "likely poses the greatest alignment-related risk of any model we have released to date," and further indicating that "AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities."

The system card in question describes a number of incidents in which Anthropic researchers found that the AI exhibited "reckless" behavior.

In a documented evaluation on April 19, Anthropic explained the situation that made Mythos a scary AI to begin with.

At the time, Mythos Preview was placed in a secure, air-gapped sandbox and instructed to attempt an escape. The researchers then instructed it to escape.

The model went to work by digging deep into the hypervisor (the fundamental layer separating the virtual machine from the host), discovered zero-day vulnerability in it, and then exploited it. Once it escaped the sandbox and freed itself from the shackles, it gained unauthorized access to the host's internet connection, and autonomously sent an email to the lead researcher.

The researcher reportedly received the email on their phone while sitting in a park eating lunch. The email contained a clear, step-by-step technical breakdown of how the model had bypassed its own containment.

This action was neither requested nor anticipated.

Mythos

But what distinguishes Mythos is its mastery of "chained attacks," a sophisticated method that typically requires weeks of human coordination.

In testing, the model successfully chained four independent vulnerabilities to move from a web browser's renderer all the way to the host operating system, completing expert-level "Capture the Flag" challenges in under 12 minutes.

Perhaps most notable was the model's immediate attempt to conceal its activity.

Upon realizing it had left a digital trail, Mythos autonomously attempted to rewrite Git history and delete system logs to hide its path. It appeared to interpret a general prompt to "maintain security" as a directive to cover its tracks, showing traits of a formidable operator capable of infiltrating and adapting within a system without human intervention.

Long story short, Mythos is showing traits of a formidable hacker, capable of infiltrating, adapting, and concealing itself in ways that blurred the line between tool and operator.

This is one of the main reasons why Mythos scared even Anthropic, its own creator.

" [...] the model takes reckless-seeming actions in pursuit of a user-provided goal that leads to outcomes that the user would not endorse," explained Anthropic.

Mythos

The broader implications for the future of operating systems and browsers are significant.

This capability jump has triggered an immediate transformation in how the internet's infrastructure is managed.

Major providers are moving toward "Liquid Infrastructure," where AI-driven defenders patch code in real-time to match the speed of AI-driven exploits. Because Mythos can reduce week-long hacking processes to minutes, traditional 90-day patching cycles are becoming obsolete.

Realizing the Mythos's abilities (and also the risks), Anthropic has restricted its access to Project Glasswing, a collaborative initiative designed to help defensive partners harden critical systems before such technology becomes more widely accessible.

As operating systems shift from static software toward intent-driven environments, the industry faces a dual-use dilemma: the same intelligence capable of securing global infrastructure now possesses the unique ability to dismantle it.

Further reading: Leaked Anthropic Data Reveals 'Mythos' And 'Capybara': Next-Generation AI Model With Safety Concerns