Artificial intelligence (AI) giant Anthropic has revealed that it disrupted a Chinese state-sponsored espionage operation powered by AI.
In what it described as the first campaign of its kind, a Chinese state-backed group used AI’s ‘agentic’ capabilities to execute cyberattacks targeting tech companies, financial institutions, chemical manufacturers, and government agencies, according to Anthropic.
In a report released on Monday, Anthropic said the threat actor —assessed with high confidence to be linked to the Chinese state— manipulated its Claude Code tool to attempt infiltration of roughly 30 global targets, succeeding in a small number of cases.
For the first time, Anthropic said, the attackers used AI’s ‘agentic’ capabilities to turn it not just into an adviser but into the executor of cyberattacks. The company called it the first documented case of a large-scale cyberattack carried out without substantial human intervention.
After discovering the campaign in mid-September, Anthropic launched an investigation to map its severity and full extent.
As part of its response, Anthropic banned accounts as they were identified, notified affected entities where appropriate, and coordinated with authorities as the probe gathered actionable intelligence, the report said.
China is a world leader in cybercrime. Estimates suggest that 30–40 per cent of global cyberattacks originate in China. In recent years, research has shown that Chinese state-sponsored attacks have surged by up to 150 per cent. These campaigns have repeatedly targeted critical systems such as government databases, energy utilities, technology firms and financial networks.
How AI became a spy
Anthropic said attackers exploited the advanced nature of AI to mount these attacks.
AI models have now reached a level where they can follow complex instructions and understand context, enabling highly sophisticated tasks. Claude’s unique coding capabilities made it particularly useful for this operation.
Quick Reads
View AllWith such advanced capabilities, AI models can act as ‘agents’, meaning they can function autonomously with minimal human input. Self-driving cars are real-world examples of this ‘agentic’ behaviour.
Moreover, AI models often have access to a wide array of software tools — frequently via the open standard Model Context Protocol. Instead of being mere chatbots, they can now search the web, retrieve data and perform actions previously reserved for human operators.
Cybercriminals can exploit these tools —combined with advanced reasoning and agentic abilities— to turn AI into password crackers, network scanners, and other security-related software.
How China turned Claude into a spy — step by step
The espionage campaign was multi-phase and highly automated, Anthropic said.
Attackers were able to use AI to perform 80–90 per cent of the campaign and intervened only at four to six critical decision points per hacking attempt.
In the first phase one, cybercriminals selected targets and built an attack framework designed to autonomously compromise systems with minimal human involvement. This framework used Claude Code as an automated tool.
They convinced Claude that its tasks were not harmful. It was crucial because Claude is trained to avoid malicious activity. Attackers achieved this by jailbreaking the model and tricking it into bypassing its guardrails. They broke down attacks into small, seemingly innocent tasks so Claude would execute them without seeing the full context and realising the actual nefarious purpose.
As part of the ruse, attackers claimed to be employees of a legitimate cybersecurity firm conducting defensive testing.
In the second phase, attackers used Claude Code to inspect target organisations’ systems and identify high-value databases.
“Claude was able to perform this reconnaissance in a fraction of the time it would’ve taken a team of human hackers. It then reported back to the human operators with a summary of its findings,” Anthropic said.
In the third phase, attackers used Claude to identify and test vulnerabilities by researching and writing its own exploit code. It then harvested usernames and passwords, granting deeper access, and extracted large volumes of private data, categorising it by intelligence value.
At this stage, Claude identified high-privilege accounts, created backdoors, and extracted data with minimal human supervision.
In the final phase, attackers used Claude to produce detailed documentation of the attack and compile files of stolen credentials and analysed systems — resources that would aid future campaigns.
As Claude could perform thousands of actions per second, the attackers dramatically accelerated their operation.
“The sheer amount of work performed by the AI would have taken vast amounts of time for a human team. The AI made an attack speed that would have been, for human hackers, simply impossible to match,” Anthropic said.
However, the company noted that Claude’s work was not flawless: it occasionally hallucinated credentials or claimed to have extracted secret information that was actually public. Ironically, these shortcomings became an obstacle to fully autonomous cyberattacks.
)