How I Use AI to Generate QA Test Cases
A repeatable workflow for turning a requirement or user story into positive, negative, and edge-case test cases with an LLM — without letting the model invent coverage.
AI is genuinely good at one part of test design that humans find tedious: enumerating variations. The trick is to treat the model as a fast junior analyst whose output you review, not as an oracle that decides coverage for you.
Give the model the requirement, the acceptance criteria, and an explicit format. Vague prompts produce vague test cases. Structured prompts produce a table you can paste into your test management tool.
The workflow in four steps
I keep the same loop for every feature, whether it is a login form or a payment API.
- Feed context — paste the user story and acceptance criteria verbatim.
- Ask for categories — positive, negative, boundary, and security cases, separately.
- Review and prune — delete duplicates, fix wrong assumptions, add domain cases the model missed.
- Lock the format — export as a table with ID, title, precondition, steps, and expected result.
A prompt that actually works
Here is the base prompt I reuse. Notice it constrains the shape of the answer, not just the topic.
You are a senior QA analyst. Generate test cases for the requirement below.
Group them under: Positive, Negative, Boundary, Security.
For each case output: id | title | precondition | steps | expected result.
Do not invent requirements. If something is ambiguous, list it under "Open Questions".
Requirement:
<paste user story + acceptance criteria>
The Open Questions instruction is the important part. It turns the model’s tendency to hallucinate into a list of clarifications for the product owner instead of silently-wrong test cases.
What the model is good and bad at
The strengths and weaknesses are consistent enough to plan around:
- Good: boundary permutations, input validation matrices, and restating acceptance criteria as steps.
- Weak: business rules that live in someone’s head, real data dependencies, and anything requiring system-specific knowledge.
Never ship AI-generated cases without a human pass. The model will confidently produce a “valid” case that violates a business rule it was never told about.
Turning output into a checklist
Once reviewed, I collapse the table into a quick pre-merge checklist for the feature:
- Positive path verified against acceptance criteria
- Each required field has an empty + invalid case
- Boundary values (min, max, off-by-one) covered
- Auth/permission negative case exists
- Open questions resolved with product owner
Why this scales
The value is not that AI writes tests. It is that the first draft of coverage now takes minutes, so the human time goes entirely into judgment: pruning noise and adding the cases only a tester who knows the product would think of. That is the part worth paying a person for.
Good AI-assisted testing is not about generating more cases. It is about spending your review time on the cases that matter.