Offensive AI Agents have arrived, and businesses need to have technical and legal protections in place to deal with the risks
This article briefly explains why businesses of all kinds and sizes need to get very serious about protecting themselves – legally and technically – from new risks posed by the emergence of ‘offensive AI Agents”.
What is an AI Agent?
Artificial intelligence frontier leader Anthropic describes an agent as “an AI model that directs its own processes and tool use when accomplishing a task—deciding for itself how to achieve what users want, rather than following a fixed script.”
Which sounds very helpful, and fairly benign.
In the right hands, AI Agents are incredibly powerful, self-directing AI solutions that can in essence make their own plans and develop solutions to business issues, more quickly and at far less cost than a team of analysts, coders and executives.
Given a strategic goal or set of goals to solve for, AI Agents can devise tactics, run a tactical program, recalibrate in real time based on outcomes of implemented tactics, adjust and run again – and again – and again – all in milliseconds, to achieve the strategic goal set.
They don’t need human direction beyond the strategic objective, and indeed operate so quickly and through such a huge volume of iterations that no human or team of humans could keep up with them.
What are “offensive” AI Agents?
An offensive AI agent is designed to penetrate the security defences of another system – secure document repositories, operating systems, web browsers, AI agent services. They are designed to attack a target system, to analyse and reveal its vulnerabilities and, depending on the developer’s intent, to actually breach the target system.
The nature of the attack – malicious or benign – depends on the developer’s definition of the strategic objective of the attack.
A real world example – can a one-man startup hack a global advisory firm?
This is a real, 2026, scenario:
- CodeWall, a UK based startup, on 28th February 2026 used a home-made “autonomous, offensive AI Agent” to penetrate the AI systems of global advisory firm McKinsey;
- within 2 hours, the AI Agent had infiltrated McKinsey’s AI systems so completely that it had visibility of 94,000 secure internal “workspaces” (internal vaults), hundreds of thousands of emails and internal communications and vast amounts of other secure, internal information;
- On 1 March, the start-up disclosed to KcKinsey the results of its penetrative tests;
- On 2 March KcKinsey implemented the defensive patches the AI Agent’s tests had identified.
Happy ending – this time
The CodeWall “offensive” penetration of McKinsey was, in a sense, performative – CodeWall wanted to demonstrate its capabilities, and meant no harm. But it could have been much, much more serious.
McKinsey’s systems were not only unable to defend against the CodeWall breach, they were unable to detect it was happening, until it had happened. What would the result had been if the offensive AI Agent had been unleashed by a bad actor, with criminal or other malicious intent?
Meanwhile, at the big end of town …
On 7 April 2026, as part of an initiative dubbed Project Glasswing, frontier AI firm Anthropic released a new AI Agent, Mythos, to a select beta testing group, prioritising partner organisations whose digital services underpin the internet. Global bank JPMorganChase was also in the beta trial group.
Full name “Claude Mythos Preview”, Anthropic described Mythos as “a striking leap” in capability compared to their previous frontier model, Claude Opus 4.6.
Anthropic’s rationale for a strictly limited release of Mythos is revealing as to their own concerns about its capabilities :
“Claude Mythos Preview is our most capable frontier model to date.
[Mythos’] large increase in capabilities has led us to decide not to make it generally available. Instead, we are using it as part of a defensive cybersecurity program with a limited set of partners.
Within days, Mythos had proved itself the most powerful hacking tool in history, penetrating the most secure systems of some of the largest, most wealthy and most capable software, internet and financial institutions in the world.
When the results of the beta tests were outlined on 14 April, there were shocked intakes of breath around the world. Mythos immediately became the subject of concerned discussion of the world’s leading central bankers, at an IMF meeting in Washington DC ostensibly to discuss the Iran War.
Speaking on 17 April 2026 specifically about Mythos’ ability to uncover and exploit vulnerabilities at machine speed and the implications for financial stability, Bank of England Governor Andrew Bailey observed:
“It is a very serious challenge for all of us.”
Christine Lagarge, President of the European Central Bank, reflected a similar level of concern:
“There is no framework in place to actually mind those things.”
Mythos had uncovered vulnerabilities in every system tested, in some cases deficiencies that had gone undetected for years. Fewer than 1% of vulnerabilities found by Mythos had been patched, meaning either that defensive systems were unable to close gaps as fast as Mythos found them, or that Mythos was able to work around defences faster than the patches could be implemented.
In running its penetrative tests, Mythos had, in effect, played simultaneous war games against multiple defensive systems in several of the world’s most technically proficient organisations, and had defeated them all.
This was hacking at an entirely new level.
Wait – it gets worse
Back at start-up CodeWall, next target on the list was Boston Consulting Group. CodeWall’s agent found an unprotected API (application protocol interface) endpoint; having identified this, the agent “walked through the front door”:
“What the agent had access to was BCG’s Workforce Analytics (WFA) data warehouse …” which gave the agent unauthenticated access to “individual employment data on hundreds of millions of real people, spanning millions of companies worldwide.”
Individual position histories, compensation, skills records, join and leave records.
Then, on 13th April, CodeWall announced they had run another penetration test, this time on global consultancy Bain’s AI system, Pyxis. Within 18 minutes, CodeWall had identified a simple Java script file containing partially redacted login credentials, resolved the missing credential elements and accessed the Bain system. Virtually instantly, the agent was able to create a comprehensive “attack map”:
“This meant that an attacker could create new accounts, modify existing ones, and embed themselves directly into Bain’s identity infrastructure.” Even if the pathway provided by the original credential was closed, “… the attacker who had already used this path would still have access.”
What can we learn from this?
What do these events, within the last 2 months, tell us about the potential for AI Agents to penetrate supposedly secure IT systems; to access personal and sensitive client information, valuable intellectual property and information?
There are 2 pretty clear inferences open:
- First, we are now in a new world of data insecurity, a new era of breach potential, and one where capital and scale are – for the hackers – no longer issues. More on this inference in a moment.
- Second, that offensive AI Agents will exploit the similarities in organisation type to inform attack strategies – it was, in part, the similarities in the KcKinsey, Bain and BCG businesses and operating models that helped CodeWall breach their systems.
Accounting firms, law firms, financial advisory firms, investment banks, human resources consultancies, insurance brokers – take note!
An offensive AI Agent breach of one practice has potential to lead to rapid replication across the sector; the offensive AI agent will almost instantly learn and leverage the vulnerabilities of similar practice types.
Good actors or Bad Actors – how important is the difference?
In both cases – CodeWall vs McKinsey/BCG/Bain and the Clause Mythos Preview beta release, the ‘offensive AI Agents” were deployed by friendly actors; there was no malign intent.
In the Mythos case, the beta testing organisations themselves ran the Mythos penetration tests, within their own controlled environments, and within the ambit of a broader, planned penetration risk program (Project Glasswing).
But Anthropic makes a few important points[1]:
“Agents act with less human oversight, so there is more room for them to misread users’ intent and take actions with unintended consequences.
Agents are also targets for ‘prompt injection’ cyberattacks, which try to trick models into taking costly actions that they otherwise wouldn’t.
Building agents that are both useful and trustworthy requires making careful product decisions.”
Agents can be designed by developers with malicious intent just as easily as those with benign, even good, intent. “Good” Agents can be manipulated by well-crafted “prompt injections”.
While little is known about the company that undertook the McKinsey, BCG and Bain hacks – CodeWall – or its principal, UK company’s office records confirm the company was incorporated on 30th March 2026 and has a single director. Basic background checks indicate the director is an experienced IT security analyst, a system penetration specialist, previously employed by a high-end UK legal and security management firm, with industry standard IT security credentials – not a rocket scientist, nor an MIT or Oxford mathematician. CodeWall is unlisted, with no record of significant capital raising, corporate or venture capital backing.
A sole trader, a ‘bootstrapped’ venture, incorporated for under a month, penetrates the systems of 3 of the world’s leading consulting practices, accessing hundreds of millions of internal, client and personal data records.
CodeWall could have been any number of hackers anywhere in the world, operating with very different and much less benign intentions. It was extremely good fortune for McKinsey, BCG and Bain that CodeWall and its principal were not bad actors – criminal or state – just a super-smart, security-experienced Londoner with great AI skills.
And while the institutions participating in Anthropic’s Project Glasswing had a heads up this time, their reality is scarcely less worrying. What if a less friendly offensive agent had got there first?
What are regulators doing to control the risks?
Across the Asia-Pacific region and around the world, governments and institutions are attempting to find an appropriate regulatory balance that allows for rapid adoption of AI and realisation of its productivity promise, while preserving data privacy protections, preventing misuse for anti-competition activity, attempting to maintain human control and accountability and preventing AI from causing “catastrophic harms”.
Approaches range from the EU’s relatively prescriptive Artificial Intelligence Act to Australia’s decision not, at this stage, to pass specific AI-directed legislation, but to instead review and adjust existing regulation as required.
California’s SB53 (Frontier Artificial Intelligence Act)[2] – the first AI-specific regulation in the US – seeks to address the potential that an AI system could cause mass harm or serious economic damage – referred to as “catastrophic risk.”
Well-intentioned AI providers such as Anthropic and AI infrastructure partners like Amazon AWS, government agencies, and regulatory bodies are seeking to create open-source technical infrastructure and transparency in AI architecture to manage the risks.
At the end of the day though, as Anthropic also notes, “None of these measures replace the work that model developers have to do to build safe and secure agents.”
If a developer wants to create an offensive AI Agent for nefarious purposes, current regulatory arrangements do not have the teeth to prevent them.
What do businesses need to take from this?
Organisations of all sizes need to take action to insulate themselves from the risks posed by the new generation of offensive AI Agents – legally and technically.
It is common, for example, for organisations to include in general Privacy Disclosures wording along these lines:
“We maintain physical, electronic and procedural safeguards in accordance with the technical state of the art and legal data protection requirements to protect your personal data from unauthorised access or intrusion.”
Similar language is often found in large-scale construction and infrastructure contracts, requiring EPC companies and key suppliers to maintain state-of-art systems.
In this new, AI agent-enabled hacking world, what is the “technical state of the art”? How can organisations prove that they made reasonable efforts to maintain appropriate “electronic and procedural safeguards”?
For small and medium enterprises in particular – professional services firms included – and for many larger organisations too, access to tools like Clause Mythos to identify and close off vulnerabilities may be unachievable for some time to come, or at all.
Still, it is not a solution for organisations to stand still, or to throw up their hands and say it’s all too hard, in the face of the new threats presented by malicious, offensive AI Agents. Responsibilities for protection of personal data and valuable proprietary information don’t evaporate because penetration risks escalate. And they have just escalated.
Legal defence requires that organisations at risk do something to ameliorate the risks.
Here are some steps organisations can take to increase their legal and technical protection in the face of escalating, offensive AI-powered hacking threats:
- Reviews of data, AI and security governance must be prioritised:
– This must be a whole of organisation review, not an IT project.
– Reviews must as a minimum analyse current and historic technical structures for data collection, access and security, to identify and close security gaps,
– They must upgrade user and awareness training for system users,
– But they must also go further, with the goal of establishing AI – Data Governance Frameworks that clearly define how the organisation plans to incorporate AI into its operations, to deal with AI offensive threats, and how stakeholders will be consulted and kept informed,
– “Opt-in/Opt-Out” arrangements may need to be considered, where clients are given the option to have their data excluded from AI-accessible repositories, and to refuse the use of AI tools for service delivery,
– AI-Data Governance Frameworks need to be designed to evolve quickly and efficiently, to identify and respond to new learnings about offensive AI agents, and to integrate technical, legal and ethical considerations. - Privacy Impact Assessments must be undertaken for new / changed business operations:
– Changes to business operations, driven by AI or otherwise, must be analysed with a focus on potential impacts on types of personal information likely to be collected, their storage, deletion and the purposes of collection and use,
– For any business process change, especially where AI is to be deployed, a Privacy Impact Assessment is a logical first step,
– Where AI agents are intended to be deployed in new operational models, ai Impact Assessments also need to be undertaken, to specifically analyse and identify new vulnerabilities that AI adoption may create. - AI usage itself must be central to the reviews:
– The McKinsey, BCG and Bain examples show that embedding AI tools and agents into operating models creates new vulnerabilities,
– For professional services firms especially, the productivity benefits of AI agents cannot be ignored, but nor can the risks,
– The solution is not to stop AI use, but to bring AI use into the centre of organisational risk management and compliance planning.
“AI for hackers” is here
Mandiant’s M-Trends is a leading review of threats and tactics used in data system breaches. The M-Trends 2026 report[3] notes:
“Recent Google Threat Intelligence Group reporting confirms that state and financially motivated actors are integrating AI to accelerate the attack lifecycle” and are “increasingly relying on large language models [frontier AI models] as a strategic force multiplier…”.
As malign AI agent developers begin to harness the power of AI, organisations of all sizes need to respond by, as a minimum, taking action to identify their potential vulnerabilities and re-crafting technical and legal defences.
[1] Anthropic: Trustworthy agents in practice 9 Apr 2026 – Trustworthy agents in practice \ Anthropic
[2] Bill Text – SB-53 Artificial intelligence models: large developers.
[3] Special Report: Mandiant M-Trends 2026
This article is for general information purposes only and does not constitute legal or professional advice. It should not be used as a substitute for legal advice relating to your particular circumstances. Please also note that the law may have changed since the date of this article.