Our expertise
Our services
Who we serve
Insights
About us
Digital Threat Digest Insights Careers Let's talk

Why AI won't be replacing the human-led penetration test in the near future

Our team shares expert insights into why AI can't, and shouldn't, replace human-led penetration testing, from Christopher Whitmore, Penetration Tester.

MT
Megan Thomas
Double circle designs part419

The world of cyber security has not escaped the shift brought about by rapid developments in AI. While these innovations are exciting and offer real benefits in terms of efficiency and scale, it's leading some to believe that it can replace human-led security testing all together.

We’ve been keeping an eye on how AI has been impacting penetration testing specifically. One example that caught our eye was XBOW, a self-proclaimed ‘autonomous’ AI vulnerability scanning tool. It has topped the US leaderboard on the ‘HackerOne bug bounty’ programme, sparking widespread debate about the implications of using AI in security testing. 

Over a 90-day period, XBOW submitted around 1,060 vulnerability reports to the programme, with 132 of those vulnerabilities confirmed as valid, or ‘awarded­­’, and a further 303 recognised as valid but not yet triaged. While these numbers are impressive at first glance, statistics can be misleading (that’s marketing for you). Looking at the same stats, but cut a different way: Just over one in ten vulnerabilities submitted by the scanner were actually successfully awarded. Resulting in humans needing to spend more time investigating false positives, which pulls them away from doing work that is actually beneficial.

This raises some key flaws in the claims that AI alone can be used to replace traditional penetration tests without any level of human intervention. 

There are still key limitations of AI when it comes to replacing human-led penetration testing:

1. AI lacks human intuition and contextual understanding

While it’s true that AI excels at large-scale repetitive tasks—like identifying vulnerability patterns—it’s not able to consider business logic or the context of unpredictable or complex scenarios (such as custom-built web apps). Every organisation is unique, with its own systems, workflows, and risk profile, and therefore requires tailored security mechanisms to protect its specific business interests.

It’s widely acknowledged that AI lacks the creativity and adaptability in building an understanding of the system being tested, in comparison to a human tester, that would have the capabilities for more advanced threat modelling and complex exploitation. This could lead to AI misinterpreting the level of threat, or overlooking novel or complex vulnerabilities, leaving critical gaps in your business' risk profile.

2. AI still needs human validation and quality assurance

Even though XBOW is described as a “fully autonomous AI-driven penetration tester”, human input is still required to validate the quality and accuracy of reports to comply with HackerOne's policies. 

This highlights two important points: 

  • An AI penetration test can never be truly independent from human validation.
  • With only around one in ten of XBOW’s total reports over a 90-day period confirmed as valid, AI penetration test tools can generate a lot of false positives that could overwhelm security teams and reduce confidence in automated systems. 

Human testers are essential for validating findings, interpreting risk and ensuring contextual, actionable results are produced. 

3. AI is limited to what it’s 'trained' to do

By design, AI systems can only work with the data they’ve been ‘trained’ on. For penetration testing, this means that an AI tool is limited to detecting ‘known’ vulnerabilities that it has already been programmed to detect. So, this means AI penetration testing tools are likely to miss new or complex attack vectors or misinterpret unique data - that’s a huge gap when it comes to having true oversight of your risk landscape.

Putting this into a real-world example: A totally custom-built web application with no third-party software components. An AI system might know how to look for common configuration issues and have a general methodology for testing the application, but it would struggle to test the application’s custom logic. These are areas where more nuanced vulnerabilities could be hiding that a human penetration tester would be trained to uncover. 

 

4. AI doesn’t always act 'appropriately'

One major limitation of fully automating penetration testing with AI is the loss of human control. Every organisation has unique systems and security requirements, which means there are often specific considerations or constraints that need to be followed during testing. Because these nuances can be complex and context-specific, an AI system might struggle to interpret them correctly, which could lead to security issues, data breaches, inaccurate reporting and other risks. 

Let’s take the example of an organisation that uses a production system for land ownership applications, where testers are instructed not to interact with certain options that would make an application public. A fully automated AI might ignore or misunderstand those constraints, unintentionally submitting large numbers of fake applications to public records.

In another scenario, imagine the same application has a searchable public database of land ownership records, but it’s vulnerable to SQL injection. An AI tool might try to ‘test’ this vulnerability by executing destructive payloads, like deleting records or even an entire database, without understanding the potential consequences. Unlike human testers, AI tools lack the contextual judgement to know the difference between safely proving a vulnerability and causing real damage. (In fact, similar behaviour has been observed in AI-assisted code generation tools like Copilot.)

This reinforces why human validation and quality assurance remain essential to penetration testing for maintaining control and delivering high-quality results aligned with each client’s specific needs.

 

5. AI can't replace human expertise, but it can enhance it

The recent development of tools, like XBOW, have shown that AI-driven tools can significantly improve efficiency, scale and coverage of testing. However, the real strength lies in combining AI tools with human expertise, rather than replacing it.

To get the most value out of penetration testing services, organisations need to adopt a combined approach; using AI tools alongside skilled human penetration testers. This allows teams to leverage both the efficiency and scale of AI while ensuring the quality of testing and a comprehensive view of your risk landscape, including complex and emerging threats.

AI tools can support and enhance penetration testing services through:

  • Automated reconnaissance: AI can automatically scrape and analyse applications and services to identify technologies in use, misconfigurations or missing security headers
  • Vulnerability pattern recognition: AI can enhance a human tester’s ability to recognise patterns in applications, especially in source code, much faster. This gives human testers more time to spend on analysing contextual impact of found vulnerabilities, providing more detailed reporting and more tailored recommendations for remediation.
  • Dynamic research: Some AI tools can aggregate and analyse data from multiple sources, helping human testers research attack techniques and develop payloads faster. This reduces the overall time taken to exploit complex vulnerabilities during testing.

It’s important to remember that many vulnerabilities are unique and contextual to an organisation, and sometimes a deep understanding of individual workflows is needed. Some vulnerabilities can also cause side effects to software during testing, so relying on AI in production environments with limited control could be potentially dangerous. 

AI is transforming the security testing landscape – and although there are clear advantages like improved efficiency, coverage and identification of patterns, there are lots of limitations to consider. A complete lack of human contextual understanding and interpretation could lead to false positives, unsafe testing and other security risks. 

The safest and most effective approach is a combination of skilled human testers who leverage AI tools and technology to enhance speed and scale. 

How PGI does things

At PGI, our experts combine deep human insight with advanced technology and tools to deliver assessments that go beyond surface-level checks. This approach ensures that our penetration testing is context-driven, evidence-based, and tailored to the unique risks of your organisation.

 Get in touch with our team today to find out how we can support you with security testing and beyond.