
Large language models (LLMs) are extremely capable. But they are more good at certain things than others.
OpenAI's introduction of ChatGPT had sent huge ripples across various industries and communities. Almost instantly, everyone scrambled to catch up. Researchers, companies, big labs, startups, all pushed to build LLMs that could reason, generate, assist, automate.
The “LLM war” is real: each new model release is judged not just by its capacity to answer questions, but how well it solves logic puzzles, writes code, reasons, handles long contexts, etc.
Google DeepMind has been in this battle for quite a while now.
First with earlier versions of Gemini, then with improvements and subsequent versions that push the envelope of logical reasoning, algorithmic skill, and performance on coding benchmarks.
But as Gemini advanced, it wasn’t enough to just compete in benchmarks; practical utility and safety became key. Models that can think more, that can evaluate their own output or avoid common pitfalls, get more attention.
There’s been pressure: flaws, security vulnerabilities, bugs, which are costly.
So when a model can avoid creating bugs or help fix them, that becomes huge.
This is where 'CodeMender' from Google DeepMind is taking its role.
Software vulnerabilities can be notoriously time-consuming for developers to find and fix.
Today, we’re sharing details about CodeMender: our new AI agent that uses Gemini Deep Think to automatically patch critical software vulnerabilities. pic.twitter.com/CJrET7ikIU— Google DeepMind (@GoogleDeepMind) October 6, 2025
CodeMender aims to bridge that gap between raw generative power and real-world software reliability. Over about six months of development, CodeMender has already upstreamed 72 security fixes to open source projects, even some large ones with millions of lines of code.
What makes CodeMender interesting is that it’s both reactive and proactive.
According to DeepMind in a blog post on its website, CodeMender can reactively patch vulnerabilities discovered already. Proactively, it goes through existing codebases and rewrites or annotations parts (for example, adding -fbounds-safety
annotations) so that entire classes of vulnerabilities (like buffer overflows, etc.) are much harder or impossible to exploit in the future.
Under the hood, the system leans heavily on Gemini Deep Think models (which have been built for reasoning, logic, algorithmic problem solving) to analyze code: find root causes, not just superficial or symptomatic fixes.
It uses tools like static analysis, dynamic analysis, fuzzing, SMT solvers, differential testing.
It generates candidate patches, but importantly also has validation steps: checking that the patch doesn’t break functionality (no regressions), that it follows style guidelines, that it really fixes what was wrong.
If something fails validation, it self-corrects, then only the high-quality patches are surfaced for human review.
One good example, is in libwebp
(an image compression library), there was a known heap buffer overflow vulnerability (CVE-2023-4863) that had been used in a zero-click iOS exploit.
CodeMender proactively patched the vulnerability by applying safety annotations to parts of it so that such overflows would be rendered unexploitable in those sections.
CodeMender has already created and submitted 72 high-quality fixes for serious security issues in major open-source projects.
It can instantly patch new flaws as well as rewrite old code to eliminate entire classes of vulnerabilities – saving developers significant time. pic.twitter.com/ajtO7rbeGV— Google DeepMind (@GoogleDeepMind) October 6, 2025
The implications are promising, and quite profound.
For open source maintainers, for companies, for software supply chains, tools like CodeMender (if matured, trustworthy, and widely adopted) means less manual toil hunting bugs, faster patching, fewer security crises.
It could shorten the time between vulnerability discovery and safe patching. It could also shift expectations: instead of “we’ll fix in next patch,” codebases are continuously strengthened. Also, for critical infrastructure, security-sensitive code, this kind of proactive safety could reduce attack surface.
On the flip side, there are risks and challenges. Automatic patches must be extremely reliable, since any new bug introduced in a patch can be dangerous. Then, there is the issue fo trust. Developers and maintainers will need to trust that AI-generated or AI-suggested patches don’t introduce regressions, performance or compatibility issues, or unintended behavior. There's also the question of oversight, auditability: how transparent are the reasoning steps, how explainable are the patches. Also, while DeepMind has these human review checks now, scaling this while maintaining quality is nontrivial.
In sum, CodeMender is a strong sign that LLMs are moving from impressive demos and chat assistants toward deeper integration into software engineering workflows, and not just to help create code, but to help secure and maintain it.
If it delivers as promised, it could help shift the balance: less reactive scrambling when vulnerabilities are found, more ongoing stability and safety.
This research is a major step toward automated security repairs and we’re looking forward to sharing technical papers and reports in the coming months.
We hope to eventually release CodeMender as a tool for all developers, to enhance software safety for everyone →… pic.twitter.com/wGrTpvNsPk— Google DeepMind (@GoogleDeepMind) October 6, 2025