Automated vs. Manual Testing Coverage
Automated tools catch roughly 30–40% of WCAG failures — understanding where that ceiling sits (and why) turns audit work into a disciplined, repeatable practice.
9 min read
The full lesson
Accessibility testing is really two different activities working together. Automated scanners run at machine speed and catch entire classes of mechanical errors. Manual evaluation uses human judgment to assess things machines cannot perceive — meaning, sequence, context, and lived experience. Use only one and you leave large, predictable gaps in coverage. Knowing exactly where those gaps are is what separates a rigorous accessibility practice from checkbox theatre.
Why Automated Tools Have a Hard Ceiling
Static analysis tools — axe-core, IBM Equal Access, Deque Axe DevTools, Lighthouse, WAVE — work by inspecting the DOM and CSSOM for rule violations they can compute without ambiguity. A missing alt attribute is either present or absent. A contrast ratio is a number. An aria-labelledby reference either resolves or it does not.
The hard ceiling sits at around 30–40% of real-world WCAG 2.2 issues. Research from WebAIM, Deque, and TPGi consistently lands in this range. The ceiling exists because the remaining failures require human judgment:
- Adequacy of alternative text. An image with
alt="image1234.jpg"fails the presence check, but one withalt="Photo"passes while remaining useless. Only a human can judge whether the text actually conveys what the image shows. - Reading order and meaningful sequence. CSS can visually reorder content without touching the DOM. A scanner may see a logical DOM order while a screen reader user experiences a scrambled narrative.
- Focus management in dynamic interfaces. A scanner can verify that an element is focusable. It cannot tell whether focus lands somewhere sensible after a dialog closes, a route changes, or an async operation completes.
- Cognitive and language complexity. WCAG 2.2 success criterion 3.1.5 (Reading Level) and related plain-language criteria cannot be assessed by a parser.
- Keyboard operability under real conditions. Running a headless browser against static HTML does not expose focus traps, broken custom widgets, or interaction patterns that only emerge during genuine keyboard navigation.
What Automated Testing Catches Well
Within its slice, automated scanning is extraordinarily efficient. Run it in CI on every pull request and it stops regressions before they ship. High-confidence catches include:
| Category | Example violations detected |
|---|---|
| Image alternatives | Missing alt, empty alt on functional images, redundant title attributes |
| Colour contrast | Text/background ratios below WCAG 2.2 AA (4.5:1 body, 3:1 large text) |
| Form labelling | Inputs without associated labels, buttons with no accessible name |
| Document structure | Skipped heading levels, missing lang attribute on html, duplicate id values |
| ARIA correctness | Invalid role values, required owned elements absent, prohibited attributes |
| Keyboard basics | Interactive elements unreachable by focus due to visibility:hidden or missing tabindex |
| Page title | title element empty or missing |
Integrate these checks early — pre-commit hooks, PR checks via GitHub Actions, and design-time linting via the axe browser extension or Figma accessibility plugins. Catching a missing label during PR review costs minutes. Catching it after launch costs hours of retesting and deployment cycles.
What Requires Manual Testing
Manual testing is not a backup for when automation fails — it is the primary method for the majority of WCAG criteria. Structure your manual reviews around these categories.
Keyboard Navigation
Walk every user flow using only Tab, Shift+Tab, Enter, Space, arrow keys, and Escape. Verify:
- Focus order follows a logical reading sequence
- Every interactive element receives a visible focus indicator (WCAG 2.2 2.4.11 Focus Not Obscured, 2.4.12 Focus Not Obscured Enhanced)
- No focus traps exist outside intentional modal contexts
- Custom widgets implement the correct ARIA keyboard interaction patterns (for example, arrow keys move within a
listbox, not Tab) - After a dialog dismisses, focus returns to the trigger element
Screen Reader Testing
Test with at least two screen reader and browser pairings. The most representative combinations in 2026 are NVDA + Chrome on Windows and VoiceOver + Safari on macOS/iOS. JAWS + Chrome/Edge covers enterprise Windows contexts. Do not rely on a single combination — screen readers behave differently from each other, especially around live regions and complex ARIA patterns.
During a screen reader pass, verify:
- Headings communicate structure, not just visual styling
- Landmark regions (
main,nav,aside,footer) exist and are labelled when more than one of the same type appears - All images have meaningful alternatives or are marked decorative with
alt="" - Dynamic content changes are announced through
aria-liveregions with appropriate politeness - Forms communicate validation errors inline, linked via
aria-describedby, not only through colour
Focus Visibility and Target Size
WCAG 2.2 introduced two new focus criteria (2.4.11 and 2.4.12) and a target size criterion (2.5.8, minimum 24 x 24 CSS pixels). Automated tools can detect outline: none but cannot reliably measure perceptual focus-indicator contrast against every possible background, nor verify the spacing-based exception for 2.5.8. These require visual inspection.
Colour and Meaning
Automation measures contrast ratios. It cannot detect when colour is the only means of conveying information (WCAG 1.4.1). A red required-field indicator, a green “success” badge with no icon or text, a chart with colour-only series labels — all pass a contrast check while failing the criterion.
Cognitive Accessibility
WCAG 2.2 added 3.3.7 (Redundant Entry) and 3.3.8 (Accessible Authentication). The authentication criterion specifically prohibits cognitive function tests — puzzles, transcription tasks — unless an alternative exists. Evaluating this requires understanding the authentication flow as a human would experience it, not parsing DOM attributes.
Motion and Reduced Motion
Verify that the prefers-reduced-motion media query is respected: animations pause, parallax stops, and auto-playing video stops or slows. Run the browser with the emulation flag set. A scanner cannot watch an animation play out over time.
Building a Repeatable Testing Protocol
Ad-hoc manual checks are better than nothing, but they do not scale. A structured protocol ensures consistent coverage across releases.
1. Define scope per release cycle. Not every screen needs a full manual pass every sprint. Triage by change impact: new components always get full coverage; unchanged screens rotate on a schedule.
2. Separate smoke tests from deep audits. A keyboard-and-contrast smoke test on every PR takes about 15 minutes. A full screen reader audit of a complex flow is a 2–4 hour session. Both have a place; conflating them creates either bottlenecks or false confidence.
3. Use a structured test script. Document steps, expected outcomes, and pass/fail criteria. This makes results comparable across testers and releases, and it is the audit trail regulators and legal teams want to see.
4. Involve disabled users. Automated tools and non-disabled testers who have learned to use assistive technology are not substitutes for people who rely on it daily. Usability testing with screen reader users, switch access users, and users with cognitive disabilities surfaces a qualitatively different set of problems. Even one session per quarter raises the quality bar substantially.
5. Track findings with severity ratings. Use a consistent scale: P0 blocks a core task, P1 causes severe friction, P2 moderate, P3 minor. This turns an audit from a list of complaints into a prioritised remediation backlog.
Do
Run automated checks in CI on every PR to catch regressions before review.
Test keyboard navigation as a distinct pass, step by step through every interactive flow.
Pair at least two screen reader/browser combinations for any screen reader audit.
Document manual test scripts so findings are comparable across releases.
Include users who rely on assistive technology in at least periodic usability sessions.
Don't
Treat a zero-violation automated scan as proof of accessibility conformance.
Rely on a single screen reader or a single operating system for all screen reader testing.
Use colour alone to communicate state, error, or meaning — even when contrast passes.
Skip focus management testing for dynamic content like modals, toasts, and route changes.
Defer manual testing to a final pre-launch audit; integrate it as a continuous activity.
Mapping Tests to WCAG 2.2 Criteria
The table below maps WCAG 2.2 criteria to how coverage is typically achieved. “Auto” means reliably machine-detectable; “Manual” means primarily human judgment; “Both” means automation can flag candidates but human confirmation is required.
| Criterion (sample) | Primary method |
|---|---|
| 1.1.1 Non-text Content | Both (presence auto; adequacy manual) |
| 1.3.1 Info and Relationships | Both |
| 1.3.2 Meaningful Sequence | Manual |
| 1.4.1 Use of Colour | Manual |
| 1.4.3 / 1.4.6 Contrast | Auto |
| 2.1.1 Keyboard | Manual |
| 2.1.2 No Keyboard Trap | Manual |
| 2.4.3 Focus Order | Manual |
| 2.4.7 Focus Visible | Both |
| 2.4.11 Focus Not Obscured (new in 2.2) | Manual |
| 2.5.8 Target Size Minimum (new in 2.2) | Both |
| 3.1.1 Language of Page | Auto |
| 3.3.1 Error Identification | Both |
| 3.3.7 Redundant Entry (new in 2.2) | Manual |
| 3.3.8 Accessible Authentication (new in 2.2) | Manual |
| 4.1.2 Name, Role, Value | Both |
Tooling Choices in 2026
The tool landscape is mature enough that the choice between axe-core, IBM Equal Access, and Deque’s enterprise Axe DevTools matters less than consistently using whichever you pick. That said, a few practical notes:
- axe-core (open source) is the most widely adopted engine. Its rules are well-documented and the false-positive rate is intentionally low.
- IBM Equal Access covers some criteria axe-core does not, particularly around ARIA pattern conformance. Running both in CI is low overhead and increases coverage.
- WAVE browser extension gives quick visual overlays that are useful during exploratory manual testing. It is not suited for CI but is excellent for on-page triage.
- Accessibility Insights for Web (Microsoft, built on axe) adds a guided manual testing module that walks testers through criteria automation cannot cover. It is a good scaffold for teams building their first manual protocol.
- Figma accessibility plugins (Stark, A11y Annotation Kit) shift some checks left into design, before any code exists. Checking contrast and documenting focus order in Figma is faster and cheaper than fixing it in production.
Avoid the outdated habit of treating a one-time scan by an external vendor as your accessibility programme. Point-in-time audits expire the moment you ship the next feature. The goal is a testing loop that runs continuously.
Communicating Coverage to Stakeholders
Accessibility testing generates findings, and findings need owners and timelines. A few habits that improve follow-through:
- Frame coverage as a percentage, not a binary. “We have automated coverage of roughly 35% of applicable criteria and manual coverage of an additional 50% based on our last audit cycle” is honest and actionable. “We passed our accessibility audit” is not.
- Attach conformance statements to releases. A short VPAT or Accessibility Conformance Report (ACR) for each major release documents what was tested, the methodology, and known exceptions. This protects the organisation legally and signals commitment to users.
- Make the remediation backlog visible. Accessibility issues buried in a separate spreadsheet get deprioritised. Track them in the same system as other engineering work — with severity, affected user group, and WCAG criterion documented — so they compete on equal footing.