kiteto logo
AI and test automation: pitfalls when generating end-to-end tests

AI and test automation: pitfalls when generating end-to-end tests

Georg Dörgeloh May 20, 2025

As a business expert like a business analyst or product owner, you’ve probably been here before: you describe what seems like a straightforward test case and think, “This can’t be that difficult… just click here, enter something there, and check the result.” Then comes sprint planning, where you discover that test automation will take far more development effort than expected, and the implementation gets cut or drastically simplified because of time constraints.

Then ChatGPT burst onto the scene. Suddenly everyone thought, “AI will fix everything!” Finally, we wouldn’t have to wrestle with flaky tests, CSS selectors, and endless debugging loops.

But as with most things that sound too good to be true, the devil is in the details. In this article, I’ll show you why AI can indeed be a game-changer for test automation, but it’s not a magic wand - and what you can do to harness its real benefits without falling into common traps.

Why We All Dream of AI-Driven Tests

Before diving into the technical details, let’s consider why we want to use AI for testing in the first place. The reason is simple: end-to-end tests are incredibly valuable yet extremely time-consuming.

The real dilemma is what I call the “expert-programmer gap.” As a business analyst or product owner, you know exactly how the application should work. You understand the business processes, requirements, and user scenarios better than any developer. You could define perfect test cases - if only you could program them yourself! As this article on test automation shows, the biggest challenge isn’t figuring out “what” to test, but “how” to translate your acceptance criteria into automated test scripts.

To bridge this gap, we first need to understand what constitutes a high-quality E2E test from a technical perspective. Ideally, it should:

Sounds simple? Unfortunately, it’s anything but.

Three Ways AI Might (Maybe) Save Your Tests

1. The Code Generator Approach: “Write Me a Test!”

The most obvious approach is asking an LLM to generate test code. This could be through general-purpose assistants like ChatGPT or Claude, or directly in your development environment with tools like Cursor, GitHub Copilot, or similar AI-powered code editors. You describe what you want to test in plain language, and the AI generates Playwright, Cypress, or Selenium code.

// AI-generated E2E test
test('User can login and update profile', async ({ page }) => {
  await page.goto('https://example.com');
  await page.fill('input[name="username"]', 'testuser');
  await page.fill('input[name="password"]', 'password123');
  await page.click('button[type="submit"]');
  await page.waitForNavigation();
  // and so on...
});

Does it work? Partially. Modern AI-powered editors like Cursor can even access and understand your application’s source code. For developers already familiar with the codebase, this approach can produce workable results.

The problems: For business analysts and product owners, this approach remains largely inaccessible. Even for developers, significant limitations exist:

It’s like trying to guide someone through assembling complex IKEA furniture over the phone - even with instructions, they miss the spatial awareness you’d have in person.

2. The Full-Time AI Tester: “Let AI Drive the Browser!”

The next logical step: let the AI do the testing itself! In this approach, an LLM takes control of a browser, receives a test case in natural language and executes it independently.

The AI sees the screen via screenshots, analyzes the DOM and decides what to do next at each step. Every test run repeats this entire process from scratch.

Does it work? Surprisingly, yes—and often quite well! Modern LLMs like GPT-4 with Vision or Claude can navigate complex websites with impressive accuracy. This approach particularly appeals to business analysts and product owners since you can describe tests in natural language without writing a single line of code.

The problems: As impressive as this approach is, it comes with serious practical drawbacks:

It’s like hiring a brilliant but painfully slow and expensive QA person who starts from scratch every time and never learns from previous experience.

3. The Hybrid Approach: “Let AI Record Tests, Then We’ll Run Them!”

The third approach combines AI’s strengths with conventional test automation’s efficiency:

  1. AI executes the test once and records all actions
  2. These actions get translated into standard test code
  3. Future executions use this code without AI involvement
  4. When UI changes, AI can regenerate the test if needed

Does it work? This approach offers a compelling compromise, especially for business analysts and software testers: You describe tests in natural language, AI handles the technical implementation, and the result can run repeatedly without additional AI costs. When the application changes, you can have the AI update tests without programming knowledge.

The challenges: Translating AI actions into robust, maintainable test code isn’t trivial and requires technical expertise. A good tool for this approach should give product owners and business analysts an easy way to review, edit, and manage tests without touching code directly.

The “Did We Overestimate This?” Phase: Three Surprising Problems with AI Test Automation

As a product owner or business analyst, you might have enthusiastically announced, “With AI, we can triple our test coverage and reduce developer dependencies!” But anyone who’s experimented with AI-driven tests knows the sobering feeling when initial excitement fades. Here are three common problems that don’t require deep technical knowledge to understand:

1. The Context Trap: When the AI’s Memory Fails

LLMs have limited context windows. For a complex website, the DOM alone can be several megabytes. Add screenshots and test steps, and you’ll quickly exceed any reasonable context limit.

AI: "Sorry, I can't remember what we were testing. This DOM is too massive for my context..."

It’s like trying to play complex chess while only remembering the last three moves.

2. The Perception Gap: “I See Something You Don’t See”

Here’s a fascinating problem: The AI sees a search button with a magnifying glass icon, but the DOM contains nothing about “search” or “magnifying glass” - just an SVG with no semantic meaning. How should the AI know which DOM element corresponds to which visual element?

3. The Identification Challenge: “What Do We Call This Thing?”

Once the AI correctly identifies an element, it needs to create a reliable selector for it. This selector must be unique and stable not just for this test run, but for all future runs - the only way to record a script that won’t break with minor UI changes.

Advanced Solution Strategies

How do we overcome these hurdles? Here’s where things get interesting:

1. Multi-Agent Architecture to Bridge the Perception Gap

Instead of burdening a single AI agent with everything, we can deploy specialized agents:

The challenge lies in orchestrating communication between these agents efficiently. Like any team, more specialization means more coordination overhead. A good AI test automation system handles this orchestration elegantly.

2. Intelligent Context Optimization

To tackle the context problem, we need effective compression without losing critical information. Instead of processing the entire DOM, we can remove unimportant attributes or elements and only pass relevant parts of the DOM. Instead of the full DOM, the AI could only work with a ‘semantic DOM’ that contains the most important elements.

Additionally, the LLM doesn’t need the entire conversation history for every request, as typical AI chats would do. Previous steps, reasoning, and results can be summarized concisely.

3. Robust Selector Strategies

To generate reliable selectors, we need a strategy or algorithm that reliably determines the best selector based on a defined priority. Like this:

  1. First look for data-testid attributes (ideal when available)
  2. Use semantic attributes like labels and ARIA roles
  3. Try relative positioning to known elements
  4. Only as a last resort: use absolute positions or complex XPath selectors

The Future: Where Do We Go From Here?

For those who master these challenges, exciting possibilities await - particularly for business analysts and product owners:

Imagine being able to automatically generate and run E2E tests right after creating a user story, without waiting for developers or QA resources!

Conclusion: AI as a Bridge Across the Expert-Programmer Gap

AI test automation isn’t a magical solution that creates perfect tests at the push of a button. Rather, it’s a powerful tool that can bridge the “expert-programmer gap” - if we understand and work with its limitations:

As a business analyst or product owner, you know the requirements better than anyone. With the right AI tools, you can now translate that expertise directly into automated tests - without writing code yourself. The future of test automation isn’t about replacing human expertise, but augmenting it with intelligent AI systems.