AI-Generated Testing: Why Most Approaches Fail
AI-generated tests frequently fail in production because they lack systematic quality standards. This document explains the problem and presents a solution combining three components: Playwright-Utils, TEA (Test Architect), and Playwright MCPs.
The Problem with AI-Generated Tests
Section titled âThe Problem with AI-Generated TestsâWhen teams use AI to generate tests without structure, they often produce what can be called âslop factoryâ outputs:
| Issue | Description |
|---|---|
| Redundant coverage | Multiple tests covering the same functionality |
| Incorrect assertions | Tests that pass but donât actually verify behavior |
| Flaky tests | Non-deterministic tests that randomly pass or fail |
| Unreviewable diffs | Generated code too verbose or inconsistent to review |
The core problem is that prompt-driven testing paths lean into nondeterminism, which is the exact opposite of what testing exists to protect.
The Solution: A Three-Part Stack
Section titled âThe Solution: A Three-Part StackâThe solution combines three components that work together to enforce quality:
Playwright-Utils
Section titled âPlaywright-UtilsâBridges the gap between Cypress ergonomics and Playwrightâs capabilities by standardizing commonly reinvented primitives through utility functions.
| Utility | Purpose |
|---|---|
| api-request | API calls with schema validation |
| auth-session | Authentication handling |
| intercept-network-call | Network mocking and interception |
| recurse | Retry logic and polling |
| log | Structured logging |
| network-recorder | Record and replay network traffic |
| burn-in | Smart test selection for CI |
| network-error-monitor | HTTP error detection |
| file-utils | CSV/PDF handling |
These utilities eliminate the need to reinvent authentication, API calls, retries, and logging for every project.
TEA (Test Architect Agent)
Section titled âTEA (Test Architect Agent)âA quality operating model packaged as eight executable workflows spanning test design, CI/CD gates, and release readiness. TEA encodes test architecture expertise into repeatable processes.
| Workflow | Purpose |
|---|---|
test-design | Risk-based test planning per epic |
framework | Scaffold production-ready test infrastructure |
ci | CI pipeline with selective testing |
atdd | Acceptance test-driven development |
automate | Prioritized test automation |
test-review | Test quality audits (0-100 score) |
nfr-assess | Non-functional requirements assessment |
trace | Coverage traceability and gate decisions |
Playwright MCPs
Section titled âPlaywright MCPsâModel Context Protocols enable real-time verification during test generation. Instead of inferring selectors and behavior from documentation, MCPs allow agents to:
- Run flows and confirm the DOM against the accessibility tree
- Validate network responses in real-time
- Discover actual functionality through interactive exploration
- Verify generated tests against live applications
How They Work Together
Section titled âHow They Work TogetherâThe three components form a quality pipeline:
| Stage | Component | Action |
|---|---|---|
| Standards | Playwright-Utils | Provides production-ready patterns and utilities |
| Process | TEA Workflows | Enforces systematic test planning and review |
| Verification | Playwright MCPs | Validates generated tests against live applications |
Before (AI-only): 20 tests with redundant coverage, incorrect assertions, and flaky behavior.
After (Full Stack): Risk-based selection, verified selectors, validated behavior, reviewable code.
Why This Matters
Section titled âWhy This MattersâTraditional AI testing approaches fail because they:
- Lack quality standards â No consistent patterns or utilities
- Skip planning â Jump straight to test generation without risk assessment
- Canât verify â Generate tests without validating against actual behavior
- Donât review â No systematic audit of generated test quality
The three-part stack addresses each gap:
| Gap | Solution |
|---|---|
| No standards | Playwright-Utils provides production-ready patterns |
| No planning | TEA test-design creates risk-based test plans |
| No verification | Playwright MCPs validate against live applications |
| No review | TEA test-review audits quality with scoring |
This approach is sometimes called context engineeringâloading domain-specific standards into AI context automatically rather than relying on prompts alone. TEAâs tea-index.csv manifest loads relevant knowledge fragments so the AI doesnât relearn testing patterns each session.