A More Balanced AI Story

When I look at the current excitement around AI in testing, I cannot help feeling a sense of déjà vu. I felt the same thing when the first wave of no code and low code testing tools arrived. At the time, vendors promised that these tools would democratise testing, eliminate the need for technical skills, and make delivery faster and less dependent on engineering effort.

What actually happened was more complicated. Some teams used those tools thoughtfully and got real value from them. Many others discovered that the optimistic narrative came with a long list of trade offs that only surfaced once the tools were embedded into the workflow. Skill gaps appeared. Ownership blurred. Tools consumed more resources than expected. People ended up doing more maintenance work, not less. And, of course, vendor lock in became a quiet but very real constraint.

The current generation of AI for testing feels remarkably similar. The marketing messages are more polished, the underlying technology is more impressive, and the promises are certainly grander. But the conversation is still one sided. The risks and long term implications are treated as afterthoughts, if they are mentioned at all.

I want to offer a more balanced perspective, grounded in what I have seen teams struggle with before.

One of the first things missing from today’s AI conversation is any acknowledgment of the environmental cost. We talk endlessly about efficiency and speed, but rarely about the computational and energy footprint of pushing test work through large models. When deterministic checks use a fraction of the resources, it is worth asking whether the trade off is justified. It is the same pattern as before: we fixate on headline benefits and ignore the hidden cost.

Another familiar pattern is the quiet erosion of skills. When low code tools appeared, I watched teams lose the ability to debug their own automation because they no longer understood the underlying logic. The same thing is happening now with AI. Some teams are already leaning so heavily on generative models that people are losing the architectural understanding needed to catch subtle failures. Vendors call this democratisation, but anyone who has had to untangle a broken system knows that the skills do still matter.

Legal liability is another area where history repeats itself. When earlier tools made autonomous changes to systems under the promise of “self healing”, nobody could agree on who was accountable when those changes went wrong. AI agents introduce the same ambiguity, but at a larger scale. Most organisations do not realise how little protection there is in vendor contracts when autonomous actions cause real damage.

Then there is the question of performance. In the low code era, teams were told that tools would scale effortlessly, only to discover that they hit a reliability plateau under real world conditions. I see the same optimism now with LLMs. The assumption is that failures are due to user error, prompt design, or immature integration. Very few people are willing to consider that the models themselves may not be suitable for certain types of testing, no matter how much tuning you do.

The human cost is also familiar. When testers became maintainers of brittle low code scripts, the job became less satisfying. AI risks creating a new version of that problem. When people spend their days verifying and correcting AI output, it quietly erodes the creative and investigative parts of the role. The work becomes narrower, not broader.

And of course, the lock in problem remains. Early automation tools trapped organisations in proprietary ecosystems, and AI fine tuning is shaping up to be the same pattern with bigger stakes. If you train a vendor model with months of your domain knowledge, you may not own any of the intelligence you helped create. Leaving the vendor often means starting again from zero.

None of this is an argument against using AI. I have seen AI improve quality work in meaningful ways when adopted thoughtfully. But I have also seen what happens when teams buy into a one sided story and only discover the trade offs once the tool is embedded and difficult to unwind.

We have been here before. The tools are new. The patterns are not.

If you want the more visual version of this conversation, I have a short YouTube video, made with NotebookLM, that covers the same themes for people who prefer watching to reading.

And if you want to explore how AI can genuinely support testing and quality work without repeating the mistakes of previous tooling waves, I am sharing more on this soon.