5 min read

Anthropic Claims "90% Autonomous" AI Hacking Campaign

Anthropic Claims
Anthropic Claims "90% Autonomous" AI Hacking Campaign
10:16

Anthropic announced last week that it detected what it called the "first reported AI-orchestrated cyber espionage campaign," claiming Chinese state-sponsored hackers used Claude Code to automate up to 90% of their attack workflow with only "4-6 critical decision points" requiring human intervention. The company framed this as a watershed moment for cybersecurity in the age of AI agents.

Outside security researchers are significantly less impressed, questioning why malicious hackers apparently get dramatically better results from AI than anyone else who uses these tools.

The Claim: Unprecedented AI Autonomy

According to Anthropic's reports published Thursday, the threat group GTG-1002 conducted a "highly sophisticated espionage campaign" targeting at least 30 organizations including major technology corporations and government agencies. The attackers allegedly developed an autonomous framework using Claude as an orchestration mechanism that "largely eliminated the need for human involvement."

The system supposedly broke complex multi-stage attacks into smaller technical tasks—vulnerability scanning, credential validation, data extraction, lateral movement—and executed them with minimal human oversight. Anthropic described this as achieving "operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement."

The company positioned this as having "substantial implications for cybersecurity in the age of AI 'agents'" and warned that such systems "can substantially increase the viability of large-scale cyberattacks."

The Skepticism: Why Do Attackers Get Better Results?

Security researchers immediately questioned the narrative. Dan Tentler, executive founder of Phobos Group and a researcher with expertise in complex security breaches, articulated the fundamental problem: "I continue to refuse to believe that attackers are somehow able to get these models to jump through hoops that nobody else can. Why do the models give these attackers what they want 90% of the time but the rest of us have to deal with ass-kissing, stonewalling, and acid trips?"

This captures a pattern that's become increasingly suspicious: AI companies regularly announce that malicious actors are achieving unprecedented results with their models while legitimate users—white-hat hackers, security researchers, software developers—consistently report only incremental gains and frustrating limitations.

If AI models can autonomously conduct 90% of a complex cyber espionage campaign for attackers, why can't they reliably complete much simpler workflows for everyone else? Why do developers still struggle with AI-generated code that requires extensive debugging? Why do security researchers find AI useful primarily for narrow tasks like log analysis and triage rather than autonomous penetration testing?

The discrepancy raises uncomfortable questions about whether these announcements serve threat awareness or marketing purposes.

New call-to-action

The Success Rate Problem

Even accepting Anthropic's account at face value reveals a significant weakness: of the 30+ targeted organizations, only a "small number" of attacks succeeded. This fundamentally undermines the "90% autonomous" framing.

If the automation rate is 90% but the success rate is low, what exactly has been automated? Failed attempts? The ability to scale failure? Would traditional human-involved methods have produced better results with more targeted, thoughtful approaches rather than automated spray-and-pray tactics?

Anthropic didn't provide specific numbers on how many successful breaches occurred, which makes evaluating the operational effectiveness of this supposedly unprecedented capability impossible.

The Tools Weren't Novel

According to Anthropic's account, the hackers used Claude to orchestrate attacks using "readily available open source software and frameworks." These tools have existed for years. Defenders already know how to detect them. Anthropic didn't detail specific techniques, tooling, or exploits, but nothing in their description suggests AI made the attacks more potent or stealthy than traditional methods.

Independent researcher Kevin Beaumont summarized: "The threat actors aren't inventing something new here."

This is the pattern researchers keep seeing: AI tools might improve workflow efficiency for certain tasks, but they don't fundamentally expand attack capabilities. The comparison many researchers make is to Metasploit or SEToolkit—hacking frameworks that have been available for decades. These tools are undeniably useful, but their existence didn't meaningfully increase the severity or sophistication of attacks. They just changed workflow mechanics.

Anthropic's Own Limitation Disclosure

Buried in Anthropic's report is an admission that significantly undermines the autonomous capabilities claim:

"Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn't work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor's operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks."

Read that again. The AI regularly hallucinated findings, fabricated credentials, and misidentified publicly available information as critical discoveries. All claimed results required "careful validation."

This is not 90% automation. This is an unreliable tool that generates output requiring constant human verification—which is exactly what legitimate AI users report experiencing. The difference is Anthropic frames this as hackers achieving unprecedented autonomy while acknowledging the system produces unreliable results that require extensive validation.

The Guardrail Bypass Wasn't Impressive

Anthropic noted the attackers bypassed Claude's safety guardrails through two methods: breaking tasks into small steps that didn't individually appear malicious, and framing requests as security professionals trying to improve defenses.

These aren't sophisticated jailbreak techniques. These are the same basic approaches everyone uses to work around AI content policies. The "pretend you're a security researcher" prompt is so common it's become a meme. Breaking complex requests into innocuous sub-tasks is standard practice for anyone trying to get useful work from AI systems with aggressive content filtering.

If these techniques constitute a major security concern, the problem is the guardrail design, not the attackers' ingenuity.

New call-to-action

The Pattern of AI Threat Inflation

This announcement fits a troubling pattern where AI companies emphasize potential threats from their technology while downplaying current limitations. The framing serves multiple purposes: it positions AI as more capable than user experience suggests, it creates urgency around AI safety research (which these companies conduct), and it generates press coverage that treats incremental tooling improvements as paradigm shifts.

The reality researchers keep encountering: AI tools are useful for specific, narrow tasks within workflows but remain far from autonomous operation for complex objectives. They're productivity multipliers for certain types of work, not autonomous agents that eliminate human involvement.

The gap between "AI assisted hackers with some workflow automation" and "AI orchestrated 90% autonomous cyber espionage campaign" is enormous, but the latter makes for better headlines and more alarming threat briefings.

What Actually Happened

Stripping away the framing, here's the more prosaic version: A sophisticated threat group used Claude to help automate parts of their reconnaissance and attack workflow. They used existing tools and frameworks that have been available for years. They targeted 30+ organizations and achieved a small number of successful breaches. The AI regularly produced unreliable output requiring human validation. The techniques and tools weren't novel or particularly difficult to detect.

This is incrementally useful automation for attackers, similar to how AI is incrementally useful for legitimate security work, software development, and content creation. It's not a cybersecurity paradigm shift.

The Real Threat Assessment

AI will continue improving attackers' workflow efficiency for specific tasks, just as it improves workflow efficiency for defenders, developers, and analysts. This creates an arms race where both sides gain marginal advantages, similar to every previous generation of security tooling.

The meaningful threat isn't "AI autonomously conducts cyber espionage campaigns." The meaningful threat is "attackers with existing capabilities can scale certain operational aspects more efficiently, requiring defenders to similarly adopt AI-assisted workflows to maintain parity."

That's a real consideration for security operations, but it doesn't require panic about unprecedented autonomous AI agents. It requires the same practical response defenders have always needed: understanding attacker tooling, monitoring for relevant indicators, improving detection capabilities, and adopting useful automation where appropriate.

When the next AI company announces that malicious actors have achieved unprecedented autonomous capabilities with their models, it's worth asking: If these capabilities are real, why aren't legitimate users getting the same results? And if legitimate users aren't getting those results, should we trust the claims about malicious actors?

The data so far suggests threat actors are experiencing the same mixed, often frustrating results as everyone else using AI tools—they're just better at prompt engineering than admitting when things don't work.

Flow AI claims to be the First AI Operating System

Flow AI claims to be the First AI Operating System

A company called Flowith just announced FlowithOS, which it's billing as "the world's first operating system natively built for AI agents." The pitch...

Read More
OpenAI Just Admitted Its New Browser Is a Security Liability

OpenAI Just Admitted Its New Browser Is a Security Liability

OpenAI's head of security, Dane Stuckey, issued a warning this week about ChatGPT Atlas, the company's new AI-powered browser: it carries...

Read More
The Voice in Your Head (And Your Phone): AI Vishing

1 min read

The Voice in Your Head (And Your Phone): AI Vishing

Remember when the worst thing that could happen to your brand was a bad Yelp review? Those were simpler times. Today, AI can clone your CEO's voice...

Read More