Skip to content
🤖 Consolidated, AI-optimized BMAD docs: llms-full.txt. Fetch this plain text file for complete context.

AI-Generated Testing: Why Most Approaches Fail

AI-generated tests frequently fail in production because they lack systematic quality standards. This document explains the problem and presents a solution combining three components: Playwright-Utils, TEA (Test Architect), and Playwright MCPs.

When teams use AI to generate tests without structure, they often produce what can be called “slop factory” outputs:

IssueDescription
Redundant coverageMultiple tests covering the same functionality
Incorrect assertionsTests that pass but don’t actually verify behavior
Flaky testsNon-deterministic tests that randomly pass or fail
Unreviewable diffsGenerated code too verbose or inconsistent to review

The core problem is that prompt-driven testing paths lean into nondeterminism, which is the exact opposite of what testing exists to protect.

The solution combines three components that work together to enforce quality:

Bridges the gap between Cypress ergonomics and Playwright’s capabilities by standardizing commonly reinvented primitives through utility functions.

UtilityPurpose
api-requestAPI calls with schema validation
auth-sessionAuthentication handling
intercept-network-callNetwork mocking and interception
recurseRetry logic and polling
logStructured logging
network-recorderRecord and replay network traffic
burn-inSmart test selection for CI
network-error-monitorHTTP error detection
file-utilsCSV/PDF handling

These utilities eliminate the need to reinvent authentication, API calls, retries, and logging for every project.

A quality operating model packaged as eight executable workflows spanning test design, CI/CD gates, and release readiness. TEA encodes test architecture expertise into repeatable processes.

WorkflowPurpose
test-designRisk-based test planning per epic
frameworkScaffold production-ready test infrastructure
ciCI pipeline with selective testing
atddAcceptance test-driven development
automatePrioritized test automation
test-reviewTest quality audits (0-100 score)
nfr-assessNon-functional requirements assessment
traceCoverage traceability and gate decisions

Model Context Protocols enable real-time verification during test generation. Instead of inferring selectors and behavior from documentation, MCPs allow agents to:

  • Run flows and confirm the DOM against the accessibility tree
  • Validate network responses in real-time
  • Discover actual functionality through interactive exploration
  • Verify generated tests against live applications

The three components form a quality pipeline:

StageComponentAction
StandardsPlaywright-UtilsProvides production-ready patterns and utilities
ProcessTEA WorkflowsEnforces systematic test planning and review
VerificationPlaywright MCPsValidates generated tests against live applications

Before (AI-only): 20 tests with redundant coverage, incorrect assertions, and flaky behavior.

After (Full Stack): Risk-based selection, verified selectors, validated behavior, reviewable code.

Traditional AI testing approaches fail because they:

  • Lack quality standards — No consistent patterns or utilities
  • Skip planning — Jump straight to test generation without risk assessment
  • Can’t verify — Generate tests without validating against actual behavior
  • Don’t review — No systematic audit of generated test quality

The three-part stack addresses each gap:

GapSolution
No standardsPlaywright-Utils provides production-ready patterns
No planningTEA test-design creates risk-based test plans
No verificationPlaywright MCPs validate against live applications
No reviewTEA test-review audits quality with scoring

This approach is sometimes called context engineering—loading domain-specific standards into AI context automatically rather than relying on prompts alone. TEA’s tea-index.csv manifest loads relevant knowledge fragments so the AI doesn’t relearn testing patterns each session.