For most of AI's history, the story was simple: humans conceived, humans coded, and humans trained every generation of model. That story is now changing — fast. Anthropic, the safety-focused AI company behind the Claude family of models, has published one of the most candid and data-rich assessments of AI's self-accelerating nature ever released by a frontier lab.

The findings are striking. As of mid-2026, more than 80% of all code merged into Anthropic's production codebase was written by Claude — not by human engineers. Average code output per engineer has grown eightfold in just two years. And in at least one controlled experiment, Claude autonomously ran an open-ended AI safety research project from start to finish, recovering 97% of the theoretical performance ceiling that two human researchers had only partially reached.

This is not science fiction. This is Tuesday at Anthropic's offices.

80%+ of Anthropic's production code authored by Claude (May 2026)
more code per engineer per quarter versus 2021–2024
52× speedup Claude Mythos achieved on a model training benchmark vs. starting code
~4 mo doubling time for the length of tasks AI can reliably complete alone

What Is Recursive Self-Improvement?

The term sounds like the plot of a science fiction thriller, but its definition is precise: an AI system that can autonomously design, train, and improve its own successor — without a human directing each step of the process. Each improved model then improves its successor even further, creating a compounding loop of capability growth that is limited only by compute and data rather than by human working hours.

Anthropic is careful to note that we are not there yet, and that recursive self-improvement is not inevitable. But the data they've released shows that AI is already accelerating the development of AI — which is the key precondition. The loop may not be fully closed today, but it is being threaded.

"If this trend holds, tasks that take a skilled person days could come into range this year. In 2027, AI systems could be capable of tasks that take a person weeks."

— Anthropic Institute, June 2026

The Timeline: From Chatbots to Autonomous Agents

Anthropic's report traces a clear evolutionary arc in how AI has been used inside its own development process:

2021 – 2023
Building the First Claude
All work done by humans on laptops — the same way every other tech company operated. AI models had no role in their own creation.
2023 – 2025
Chatbot Assistants
Engineers began using early Claude models to generate short code snippets, then manually copy-pasting results into editors. A convenience, not a transformation.
2025 – 2026
Coding Agents
Claude Code launched in early 2025. Models began writing and editing entire files autonomously — humans set goals, Claude devised and executed methods.
Today (Mid-2026)
Autonomous Agents Running Other Agents
Claude now runs its own code, delegates hours of sub-tasks to other agent instances, and coordinates complex workflows. One human engineer now steers the work of what previously required a team.
20XX — The Horizon
Closing the Loop
The open question: can AI systems develop enough research judgment to design and train their own successors? That would complete the recursive self-improvement loop.

What the Benchmarks Show

The external world has been measuring this acceleration too. The METR organization, which evaluates how long AI systems can reliably work on tasks without human intervention, found that the typical time horizon has been doubling every four months — an acceleration from the earlier rate of doubling every seven months.

To put numbers on that: in March 2024, Claude Opus 3 could handle software tasks that take a human about four minutes. By March 2025, Claude Sonnet 3.7 handled tasks requiring about ninety minutes. By mid-2026, Claude Opus 4.6 was reliably completing tasks that would occupy a skilled human for twelve hours. METR's evaluation of Claude Mythos Preview found it could sustain focused autonomous work for at least sixteen hours — the upper boundary of what their measurement framework could even assess.

On the SWE-bench software engineering benchmark — which presents AI models with real open-source codebases and genuine bug reports — scores moved from low single digits to benchmark saturation in just two years. A similar pattern appeared on CORE-Bench, which tests whether AI can reproduce existing research: from a 20% success rate in 2024 to saturation fifteen months later.

Evidence From Inside Anthropic

Public benchmarks tell part of the story. But the most compelling evidence comes from Anthropic's own internal data — metrics the company has never before shared publicly.

Claude Writes Most of Anthropic's Code

Lines of code merged per engineer remained essentially flat throughout 2021–2024. That changed when Claude Code launched in February 2025, allowing the model to run code rather than just suggest it. The productivity curve inflected upward. It steepened again in 2026 as Claude gained the ability to manage long autonomous work sessions. By Q2 2026, the typical Anthropic engineer was merging eight times as much code per day as in 2024.

One particularly dramatic example: in April 2026, Claude shipped over 800 targeted fixes that reduced a class of API errors by a factor of one thousand. The engineer who oversaw the work estimated that a human would have required four years to do the same — the problem demanded holding vast amounts of unfamiliar context simultaneously, something AI handles without the cognitive fatigue humans experience.

"I started leaning hard into using Claude about a year ago. That's been a crazy adventure, and it's now been roughly five months since I last wrote any code myself."

— Anthropic employee, May 2026

Claude Is Getting Better at Research Judgment

This is the more consequential finding, because executing defined tasks is not the same as choosing which tasks matter. Anthropic's report carefully distinguishes between two levels of AI capability: running a specified experiment (which Claude now does extremely well), and deciding which experiments are worth running (which Claude is still developing).

On a kernel optimization benchmark where Claude is given training code and asked to make it run as fast as possible, Opus 4 achieved roughly a three-times speedup in May 2025. By April 2026, Claude Mythos Preview hit a fifty-two-times speedup on the same task — compared to the four-to-eight hours a skilled human researcher would need to achieve a four-times improvement.

In a separate internal evaluation (n=129 sessions), Anthropic tested whether Claude could suggest a better "next step" than the human researcher during real open-ended investigations. In November 2025, Claude Opus 4.5 outperformed humans 51% of the time. By April 2026, Mythos Preview did so 64% of the time.

Key Takeaways for AI Watchers

  • AI already handles over 80% of Anthropic's production code — this is not a hypothetical future state.
  • The gap between AI and humans on research judgment is closing, not holding steady.
  • Claude Mythos Preview ran a full open-ended AI safety research project autonomously, recovering 97% of the theoretical performance ceiling two humans only partially reached.
  • Anthropic is calling for global coordination mechanisms — similar to arms control treaties — to govern recursive self-improvement if it arrives.
  • Even without full recursive improvement, today's AI could enable a 100-person company to operate at the scale of a 1,000-person organization.

The Three Futures Anthropic Sees

Rather than predicting one outcome, Anthropic lays out three plausible scenarios — listed here in order of urgency:

01
Possible · Least Likely

The Trend Stalls

Progress hits diminishing returns — an S-curve rather than a continuing exponential. Today's capabilities diffuse widely but don't compound further. Governments gain time to adapt. Anthropic considers this unlikely given all current trajectories.

02
Likely · Near-Term

Compounding Efficiency

AI development stays substantially automated, but humans retain direction-setting. A 100-person company does the work of 100,000. Knowledge work is revolutionized. The risk: efficiency gains can also enable authoritarian surveillance and influence operations at unprecedented scale.

03
Uncertain · Highest Stakes

Full Recursive Self-Improvement

AI builds its own successors without human involvement. The pace of AI progress becomes limited only by compute supply. Humans shift to verification and oversight roles. The alignment problem — whether AI systems remain beneficial as they grow more capable — becomes critical and urgent.

What Should Be Done About It?

Anthropic does not offer a comfortable reassurance here. The company says it would be "likely a good thing" to have the option to slow or temporarily pause frontier AI development — but acknowledges that a unilateral slowdown by one lab would simply hand the lead to competitors. Without a verified, multi-party coordination mechanism, competitive and geopolitical pressures will push all actors to keep building.

Anthropic draws an explicit parallel to arms control treaties, noting that those regimes took decades to build — time the world may not have for AI governance. The company is committing to convene policymakers, researchers, civil society groups, and other AI companies to begin that process, and will publish what emerges from those conversations.

Meanwhile, Project Glasswing — Anthropic's program providing limited access to Claude Mythos Preview to trusted organizations — has already demonstrated what advanced AI can do when pointed at security. In its first weeks, Mythos Preview identified more than ten thousand high- and critical-severity software vulnerabilities across the world's most important systems. The bottleneck in cyber defense has already shifted: finding vulnerabilities is no longer the hard part — patching them fast enough is.


Frequently Asked Questions

Recursive self-improvement occurs when an AI system can autonomously design, train, and deploy improved versions of itself — without step-by-step human direction. Each improved version can then build an even better successor, potentially creating an accelerating cycle of capability growth.

Claude now authors the majority of Anthropic's production codebase (over 80% as of May 2026) and can autonomously run certain research experiments. However, humans still set the research direction and evaluate results — the loop is not yet fully closed, but the evidence clearly shows it is narrowing.

Claude Mythos Preview is Anthropic's most capable frontier model. It is currently not publicly available due to cybersecurity considerations around its extraordinary capability. It is being used selectively through Project Glasswing with a small number of trusted research and security organizations.

Anthropic takes the risk seriously. Their concern is that if AI systems can build successors without adequate human oversight, small misalignments in values or behavior could compound across generations of models. This is exactly why alignment research — ensuring AI systems remain beneficial as they grow more capable — is the company's core mission.

The most direct near-term impact is on knowledge work. Even without full recursive self-improvement, today's AI tools mean that small teams can accomplish what previously required large organizations. This creates extraordinary opportunity — and also raises important questions about labor, economic distribution, and governance that societies have barely begun to address.

📌 Source: This article analyzes research published by the Anthropic Institute in June 2026. Original piece: anthropic.com/institute/recursive-self-improvement. All statistics cited are from Anthropic's published data.