QA Engineer · Agentic SDLC Personas

The QA Engineer is the persona that designs, runs, and evolves the test strategy for the whole product. In an AI-native SDLC, the QA Engineer operates a Test Strategist agent, a bank of slash prompts, and a validated MCP catalog — not a manual test console.

Executive summary

The QA Engineer owns the Verification phase. Their job is to prove that the code shipped by Developers actually satisfies the EARS requirements and Given-When-Then acceptance criteria locked into SPECIFICATION.md. In an AI-native SDLC, that proof is produced by a Test Strategist agent, four slash prompts, scoped instructions, and a small set of validated MCPs that reach into Azure DevOps Test Plans, GitHub Actions, and Playwright.

The primary outputs are a living test matrix, a mutation-tested suite, a flake register with root causes, and coverage-gap reports that route back to Developers as fresh issues. The QA Engineer does not compete with Developers on unit tests; they orchestrate the broader verification portfolio: integration, contract, end-to-end, resilience, and exploratory testing supported by AI.

The QA Engineer does not replace Developer-written unit tests; they guarantee that the union of all verification layers actually demonstrates the product meets its contract. They are the last principled defender of the invariant: no undocumented behavior reaches production.

Role and responsibilities

Think of the QA Engineer like a flight-test engineer. The pilots fly, the mechanics build, but someone has to design the envelope-expansion plan that proves the aircraft safe across every regime it will ever see. In an AI-native SDLC, the QA Engineer is accountable for the envelope of automated and exploratory evidence that lets the team ship every day with confidence.

Primary responsibilities:

Translate each EARS requirement into at least one executable test and one manual exploratory charter
Own the test matrix across unit, contract, integration, end-to-end, performance, and accessibility lanes
Run mutation testing on critical modules and route survivors back as issues
Triage flaky tests within 24 hours, never silence them
Maintain the Playwright MCP fixtures for end-to-end and visual coverage
Operate the Test Strategist agent and the /test-plan, /mutation-scan, /flake-triage, /coverage-gap prompts
Publish weekly verification dashboards from Azure Monitor and GitHub Actions data
Coach Developers on test design during pair sessions and PR reviews

Jobs to be done

As a QA Engineer, I want a generated test plan for every feature that links back to spec IDs, so that no acceptance criterion ships unverified.
As a QA Engineer, I want mutation scores per module, so that I know where assertions are weak before the incident finds out.
As a QA Engineer, I want flaky tests triaged within one working day, so that the signal stays trustworthy.
As a QA Engineer, I want coverage gaps routed as issues with suggested tests, so that Developers close them in the same sprint.
As a QA Engineer, I want Playwright runs recorded and indexed, so that post-incident analysis is a query, not an archaeology expedition.
As a QA Engineer, I want the test matrix exported to Azure DevOps Test Plans, so that non-engineering stakeholders can browse verification evidence.
As a QA Engineer, I want accessibility checks embedded in every end-to-end run, so that WCAG regressions are caught at merge time.
As a QA Engineer, I want a weekly summary posted to Microsoft Teams, so that the whole org sees product quality trend lines.

Pain points before AI-native

Tests trail features. QA writes tests after code ships; scope drifts and regressions leak into the backlog.
Flake amnesia. Flaky tests are quarantined and forgotten. The same race condition bites twice a quarter, and the team loses trust in red builds.
Coverage theatre. Line coverage hits 80 percent while critical branches are never exercised; audits pass and incidents still happen.
Exploratory work undocumented. Charters live in notebooks; learnings never become automated regression, so the same bug returns in a different screen.
Handoffs lose context. Developer fixes a bug, QA re-tests, but the failing scenario that proved the fix is not attached to the PR, so future regression is unprotected.
No shared dashboard. Verification status lives in five tools; leadership has no single, current view of product quality.

AI-native daily workflow

The QA Engineer works from Visual Studio Code with GitHub Copilot and from the terminal with Claude Code, invoking the Test Strategist agent throughout the day.

Morning setup

Pull main, open the repo in VS Code, let Copilot Chat load AGENTS.md and the scoped .github/instructions/tests.instructions.md.
Run /test-plan --since=yesterday to scan merged PRs and produce a delta plan: which new EARS requirements need coverage today.
Open Azure DevOps Test Plans and GitHub Actions dashboards to review overnight runs; queue flake candidates.
Review the overnight mutation scan output posted to the Verification channel in Microsoft Teams; pin the top three survivors as today’s priority.
Sync with the on-call SRE for any production incidents that should feed a new exploratory charter.

Midday execution

For each feature PR, invoke /test-plan with the linked spec section. The Test Strategist proposes missing lanes and writes skeleton tests.
Run /mutation-scan on modules changed in the last 24 hours. Survivors become issues tagged verification/weak-assertion.
Drive exploratory sessions with the Playwright MCP. Every finding that reproduces becomes a committed Playwright spec.
Pair with the Developer on red tests; never approve a PR where a test was deleted without an equivalent replacement.

Afternoon review

Run /flake-triage against the daily Actions logs. The agent clusters failures, proposes root causes, and opens issues with a suggested fix owner.
Publish the verification dashboard: coverage by module, mutation score, flake rate, open exploratory charters.
Update SPECIFICATION.md traceability: every requirement ID links to at least one passing test.

Recommended primitives

Agent

Agent	File	Purpose
`test-strategist`	`.github/agents/test-strategist.agent.md`	Designs test plans, runs mutation scans, triages flakes, routes coverage gaps

The Test Strategist runs on claude-sonnet-4-6 with tools read, grep, bash, edit. Extended thinking is enabled for mutation analysis only.

Slash prompts

Command	File	Purpose
`/test-plan`	`.github/prompts/test-plan.prompt.md`	Generate or update the test plan for a feature, linked to EARS IDs
`/mutation-scan`	`.github/prompts/mutation-scan.prompt.md`	Run mutation testing on changed modules and route survivors
`/flake-triage`	`.github/prompts/flake-triage.prompt.md`	Cluster failing Actions runs, propose root causes, open issues
`/coverage-gap`	`.github/prompts/coverage-gap.prompt.md`	Map untested branches to specific spec sections and suggest tests

Instructions scoped

Scope (`applyTo`)	File	Purpose
`tests/*/`	`.github/instructions/tests.instructions.md`	AAA pattern, deterministic fixtures, no hidden sleeps
`tests/e2e/*/`	`.github/instructions/playwright.instructions.md`	Playwright locators, trace capture, retry policy
`docs/specs/*/.md`	`.github/instructions/traceability.instructions.md`	Every requirement links to at least one test ID

Hooks

pre-push: run the fast test lane; block push on failure
post-merge: regenerate the test matrix and publish to Azure DevOps Test Plans
nightly: run the mutation scan on modules changed that day
pre-release: enforce minimum mutation score and traceability coverage thresholds
on-flake: open a GitHub issue automatically when a test fails twice in seven days

Validated MCPs

MCP	Purpose	Owner
GitHub MCP Server	Read PRs, Actions runs, annotate issues	GitHub
Azure DevOps MCP Server	Sync the test matrix into Azure DevOps Test Plans	Microsoft
Playwright MCP	Drive exploratory sessions and recorded end-to-end runs	Microsoft
Azure MCP Server	Query Azure Monitor and Application Insights for failure telemetry	Microsoft
Microsoft Learn Docs MCP	Look up current test guidance for Azure services under test	Microsoft

Real examples

Example 1: mutation survivors become PRs

Running /mutation-scan src/billing/ returns 7 survivors in invoice-total.ts. The Test Strategist drafts 7 new test cases and opens a single issue titled verification/weak-assertion: billing invoice-total. A Developer picks it up, writes the tests, and the next mutation run returns zero survivors in that module. The KPI dashboard shows module mutation score climb from 58 to 94 percent.

Example 2: flake triage ends a six-month bug

/flake-triage clusters 14 failures in checkout.spec.ts from the last week and points to a race between a Playwright page.click and an inflight Azure Front Door cache revalidation. The Test Strategist proposes a fix: wait for the specific network response, not a timeout. The fix lands as a one-line change, the test stops flaking, and the Reliability KPI for that suite returns to 99.8 percent.

Example 3: coverage gap routed back to the Developer

/coverage-gap inspects the last merged PR against src/payments/refunds.ts and finds the partial_refund branch is unreachable by any current test. The Test Strategist opens an issue with a drafted Given-When-Then, links it to the spec requirement REQ-PAY-044, and assigns it to the original author. The issue closes within the same sprint, and the traceability matrix flips from 96 to 100 percent for the Payments module.

Anti-patterns

Silencing flakes with retry counts. Retries hide the defect; the incident still ships.
Chasing line coverage. Branches and mutants matter; lines are a vanity number.
Manual regression packs. Every manual regression that finds a bug must become an automated test in the same week.
Writing tests after merge. Verification work done post-merge loses leverage; design tests alongside the spec.
End-to-end as the only net. End-to-end tests are slow and brittle; push shift-left into contract and unit layers.
Unowned fixtures. Shared Playwright fixtures without a named owner rot silently; assign an owner and review every sprint.
Treating agents as oracle. The Test Strategist proposes; the QA Engineer decides. Every AI-generated test is reviewed before merge.

KPIs and impact metrics

Metric	Baseline (manual)	Target (agentic)	Source
Escape rate (bugs found post-release)	12 per release	< 2 per release	Application Insights, GitHub issues
Mutation score on critical modules	Unknown	> 85 percent	Mutation runner in GitHub Actions
Test suite reliability	88 percent	> 99.5 percent	Actions flake rate
Coverage gap closure time	3 sprints	< 1 sprint	GitHub Projects
Requirements with linked test	55 percent	100 percent	Traceability check in CI
Accessibility violations per release	18	0 critical	Playwright axe runs
Mean time to triage a flake	5 days	< 24 hours	GitHub Actions logs

Maturity in four levels

L1 Manual: Spreadsheet test cases, no mutation testing, flake quarantine folder, no agent. Verification is a bottleneck after code-complete.
L2 Assisted: Copilot autocomplete inside Playwright specs, test matrix in a wiki, manual flake triage, ad-hoc exploratory sessions.
L3 Augmented: Test Strategist agent, four slash prompts, scoped instructions, Playwright MCP wired to GitHub Actions, mutation scans on demand.
L4 Autonomous: Nightly mutation scan with auto-routed issues, flake triage closing root causes within 24 hours, requirements-to-tests traceability at 100 percent, verification dashboard published daily to Microsoft Teams.

Integration with other personas

From Business Analyst: fresh EARS requirements with unique IDs for traceability.
From Developer: merged PR with passing unit tests and a declared spec link.
To Developer: coverage-gap and mutation-survivor issues with suggested test drafts.
From Software Architect: architectural risk register that informs resilience test lanes.
To SRE: verified deployment artifact and updated synthetic probes.
To Compliance Auditor: traceability matrix linking requirements to evidence of test execution.
With UAT Analyst: shared Playwright fixtures between system tests and acceptance tests.

Glossary

Test matrix: the living table mapping requirements to test lanes, owners, and status.
Mutation score: percentage of injected faults detected by the test suite.
Flake: a test that passes and fails non-deterministically on unchanged code.
Exploratory charter: a time-boxed investigation with a stated mission, not a script.
Traceability: a verifiable link from requirement ID to test ID to PR.
Verification lane: a category of automated checks (unit, contract, integration, end-to-end, performance, accessibility, resilience) with its own SLA and runner.
Escape rate: bugs discovered in production that should have been caught before release.

References

Testing guidance on Microsoft Learn — Azure DevOps Test Plans and strategy
GitHub Actions testing docs — CI patterns for verification
Playwright MCP — Microsoft-maintained MCP for browser automation
Application Insights availability tests — synthetic monitoring
GitHub Copilot for test generation — agent-assisted test authoring
GitHub Advanced Security — CodeQL and secret scanning signals fed into the test plan
Azure Load Testing — performance and resilience lane
Azure Application Insights smart detection — anomaly signals for exploratory charters