Our expertise
Our services
Who we serve
Insights
About us
Digital Threat Digest Insights Careers Let's talk

AI red teaming is niche now, but we know where it’s heading

GOSCL
Guy O’Donnell, Strategic Client Lead
Bridge

HMRC has just signed a £175 million, ten-year contract with Quantexa, a UK-based data, analytics and AI software company. The system will pull together fragmented data across government to flag tax fraud and direct enforcement effort, at scale. HMRC oversees roughly £800 billion in tax revenue a year, so by any measure, this is well and truly consequential AI.

Most of the coverage has treated it as a procurement story; a large contract, a well-funded vendor, a government getting serious about compliance. 

But that framing misses what I think is the more important question: What happens when someone decides to attack it? We’ve seen this before; a new class of critical system appears, organisations focus first on capability and rollout and only later realise that adversaries will test it too. 

AI systems that decide things will be attacked

This isn't just speculation; the pattern is already established. Wherever a system makes decisions with money or consequence attached, attackers will probe it. Fraud teams have spent years watching adversaries map the edges of rules-based detection systems, learning which transaction patterns trigger alerts and which don't. 

AI systems may be more sophisticated targets, but the adversarial logic is identical.

The examples aren't theoretical:

  • Banks have confirmed deepfake voice fraud incidents, where synthesised audio was used to impersonate account holders and authorise transfers.
  • Malware classifiers and spam filters have faced well-documented evasion attacks – researchers and criminals have spent years learning to craft inputs that slide past automated detection. 
  • Prompt injection attacks against customer-facing AI (incidents at Air Canada and DPD being some of the most cited cases) demonstrated that production systems could be manipulated through their own interfaces by ordinary users with no specialist knowledge.

The techniques exist and are in use. The question isn’t whether AI systems used in tax enforcement, benefits assessment or financial regulation will face adversarial pressure, it’s: 

Which ones and how soon?

Cyber learned this lesson – it took fifteen years

We have a useful precedent. It just played out so slowly that it's easy to forget how it happened.

In 1999, The Matrix introduced millions of people to the idea that a networked system could be infiltrated, manipulated and turned against its operators. That same year, penetration testing was a niche discipline practised by a handful of researchers. Most organisations had no idea it existed. In 2004, PCI DSS made it mandatory for card processors. At the time, it was considered a blunt regulatory intervention driven by the cost of card fraud reaching levels that demanded a response. CREST was founded in 2006 to professionalise and credential the practice. NIST published its technical pen testing standard in 2008. By the mid-2010s, no regulated organisation was procuring serious technology without a pen testing requirement somewhere in the process.

That's roughly fifteen years from niche to a standard procurement requirement. There were four key forces that drove it:

  • Regulator expectation
  • Insurance pricing
  • Post-incident learning
  • Board accountability

None of those four emerged all at once; they accumulated, reinforced each other and eventually made the question “has this been tested?” as unremarkable as asking whether a building has a fire alarm.

All four forces are now in motion for AI.

AI red teaming will follow the same path, faster

The regulatory direction is already clear:

  • NIST's AI Risk Management Framework was published in 2023
  • The EU AI Act requires conformity assessment for high-risk AI systems
  • The UK AI Safety Institute and its US counterpart are conducting red team evaluations of frontier models
  • ISO/IEC 42001 is taking shape as an AI management standard

None of this mandates AI red teaming as a specific, credentialled practice…yet. But the direction is unmistakable and the gap between ‘direction’ and ‘mandatory’ is closing faster than it did for pen testing. Let’s be honest, the regulatory cycle moves quicker now, public and political attention on AI is higher than it ever was on cyber security in 2004. And insurers are already beginning to ask about AI exposure within cyber liability policies—the pricing signal that, in retrospect, did as much as any regulation to normalise pen testing as routine hygiene.

My estimate: AI red teaming will become a procurement expectation in regulated sectors within five to seven years. Financial services, healthcare and public sector will get there sooner.

What's missing right now is the infrastructure the cyber industry built: There’s no CREST equivalent, no agreed engagement standard, no common methodology for what an AI red team should produce and how findings should be reported. That gap will close. The question for organisations procuring AI today is whether they wait for the standard to exist before taking it seriously (which is exactly what most organisations did with cyber, and exactly why the 2000s were so expensive).

Why the skills that matter here are not the obvious ones

AI red teaming sits at the intersection of three distinct capabilities: 

  • Thinking like an attacker
  • Hands-on technical probing of live systems
  • Understanding the decision workflows the AI is embedded in

Most AI consultancies have none of these. They understand model architecture, not adversarial tradecraft. Most pen testing firms have one (usually the hands-on technical piece) but not the threat intelligence background or the investigative depth to understand how a system's decisions can be gamed by someone who has spent time understanding its purpose.

The combination is important because the most dangerous attacks on consequential AI systems won't come from people probing APIs at random. They'll come from people who understand what the system is trying to decide, what evidence it relies on and how to shape that evidence to produce a different outcome. That requires threat intelligence thinking, not just technical testing.

This is exactly the kind of work PGI has been building toward through adjacent disciplines. Our experience across fraud, financial crime, cyber investigation and threat-led testing means we are used to examining how systems behave when someone is actively trying to manipulate an outcome, not just how they perform under normal conditions.

The question you, the buyer, should be asking

If you are procuring AI for anything consequential—enforcement, fraud detection, credit decisions, benefits eligibility—the primary question is probably not “is it accurate?” because accuracy is the vendor's pitch. It tells you how the system performs against known inputs.

The question that tells you more is: “Has it been attacked and what can it survive?”

That question doesn't have a standard answer yet, but it will. Organisations that start asking it now, before the framework exists, before the regulator requires it, will be better placed than those who wait to be told.

We're not watching the beginning of a new discipline. We're watching a repeat of one.

Let’s talk

Most organisations will wait for regulation. The smarter ones won’t. If you're procuring AI for anything consequential, now is the time to test whether it can be manipulated before someone else does. We've spent years thinking like attackers and we can apply that mindset to the AI systems you rely on. Let's talk.