Automated vs. Manual Testing Coverage

The full lesson

Accessibility testing is really two different activities working together. Automated scanners run at machine speed and catch entire classes of mechanical errors. Manual evaluation uses human judgment to assess things machines cannot perceive — meaning, sequence, context, and lived experience. Use only one and you leave large, predictable gaps in coverage. Knowing exactly where those gaps are is what separates a rigorous accessibility practice from checkbox theatre.

Why Automated Tools Have a Hard Ceiling

Static analysis tools — axe-core, IBM Equal Access, Deque Axe DevTools, Lighthouse, WAVE — work by inspecting the DOM and CSSOM for rule violations they can compute without ambiguity. A missing alt attribute is either present or absent. A contrast ratio is a number. An aria-labelledby reference either resolves or it does not.

The hard ceiling sits at around 30–40% of real-world WCAG 2.2 issues. Research from WebAIM, Deque, and TPGi consistently lands in this range. The ceiling exists because the remaining failures require human judgment:

Adequacy of alternative text. An image with alt="image1234.jpg" fails the presence check, but one with alt="Photo" passes while remaining useless. Only a human can judge whether the text actually conveys what the image shows.
Reading order and meaningful sequence. CSS can visually reorder content without touching the DOM. A scanner may see a logical DOM order while a screen reader user experiences a scrambled narrative.
Focus management in dynamic interfaces. A scanner can verify that an element is focusable. It cannot tell whether focus lands somewhere sensible after a dialog closes, a route changes, or an async operation completes.
Cognitive and language complexity. WCAG 2.2 success criterion 3.1.5 (Reading Level) and related plain-language criteria cannot be assessed by a parser.
Keyboard operability under real conditions. Running a headless browser against static HTML does not expose focus traps, broken custom widgets, or interaction patterns that only emerge during genuine keyboard navigation.

What Automated Testing Catches Well

Within its slice, automated scanning is extraordinarily efficient. Run it in CI on every pull request and it stops regressions before they ship. High-confidence catches include:

Category	Example violations detected
Image alternatives	Missing `alt`, empty `alt` on functional images, redundant `title` attributes
Colour contrast	Text/background ratios below WCAG 2.2 AA (4.5:1 body, 3:1 large text)
Form labelling	Inputs without associated labels, buttons with no accessible name
Document structure	Skipped heading levels, missing `lang` attribute on `html`, duplicate `id` values
ARIA correctness	Invalid role values, required owned elements absent, prohibited attributes
Keyboard basics	Interactive elements unreachable by focus due to `visibility:hidden` or missing `tabindex`
Page title	`title` element empty or missing

Integrate these checks early — pre-commit hooks, PR checks via GitHub Actions, and design-time linting via the axe browser extension or Figma accessibility plugins. Catching a missing label during PR review costs minutes. Catching it after launch costs hours of retesting and deployment cycles.

What Requires Manual Testing

Manual testing is not a backup for when automation fails — it is the primary method for the majority of WCAG criteria. Structure your manual reviews around these categories.

Walk every user flow using only Tab, Shift+Tab, Enter, Space, arrow keys, and Escape. Verify:

Focus order follows a logical reading sequence
Every interactive element receives a visible focus indicator (WCAG 2.2 2.4.11 Focus Not Obscured, 2.4.12 Focus Not Obscured Enhanced)
No focus traps exist outside intentional modal contexts
Custom widgets implement the correct ARIA keyboard interaction patterns (for example, arrow keys move within a listbox, not Tab)
After a dialog dismisses, focus returns to the trigger element

Test with at least two screen reader and browser pairings. The most representative combinations in 2026 are NVDA + Chrome on Windows and VoiceOver + Safari on macOS/iOS. JAWS + Chrome/Edge covers enterprise Windows contexts. Do not rely on a single combination — screen readers behave differently from each other, especially around live regions and complex ARIA patterns.

During a screen reader pass, verify:

Headings communicate structure, not just visual styling
Landmark regions (main, nav, aside, footer) exist and are labelled when more than one of the same type appears
All images have meaningful alternatives or are marked decorative with alt=""
Dynamic content changes are announced through aria-live regions with appropriate politeness
Forms communicate validation errors inline, linked via aria-describedby, not only through colour

Focus Visibility and Target Size

WCAG 2.2 introduced two new focus criteria (2.4.11 and 2.4.12) and a target size criterion (2.5.8, minimum 24 x 24 CSS pixels). Automated tools can detect outline: none but cannot reliably measure perceptual focus-indicator contrast against every possible background, nor verify the spacing-based exception for 2.5.8. These require visual inspection.

Colour and Meaning

Automation measures contrast ratios. It cannot detect when colour is the only means of conveying information (WCAG 1.4.1). A red required-field indicator, a green “success” badge with no icon or text, a chart with colour-only series labels — all pass a contrast check while failing the criterion.

Cognitive Accessibility

WCAG 2.2 added 3.3.7 (Redundant Entry) and 3.3.8 (Accessible Authentication). The authentication criterion specifically prohibits cognitive function tests — puzzles, transcription tasks — unless an alternative exists. Evaluating this requires understanding the authentication flow as a human would experience it, not parsing DOM attributes.

Motion and Reduced Motion

Verify that the prefers-reduced-motion media query is respected: animations pause, parallax stops, and auto-playing video stops or slows. Run the browser with the emulation flag set. A scanner cannot watch an animation play out over time.

Building a Repeatable Testing Protocol

Ad-hoc manual checks are better than nothing, but they do not scale. A structured protocol ensures consistent coverage across releases.

1. Define scope per release cycle. Not every screen needs a full manual pass every sprint. Triage by change impact: new components always get full coverage; unchanged screens rotate on a schedule.

2. Separate smoke tests from deep audits. A keyboard-and-contrast smoke test on every PR takes about 15 minutes. A full screen reader audit of a complex flow is a 2–4 hour session. Both have a place; conflating them creates either bottlenecks or false confidence.

3. Use a structured test script. Document steps, expected outcomes, and pass/fail criteria. This makes results comparable across testers and releases, and it is the audit trail regulators and legal teams want to see.

4. Involve disabled users. Automated tools and non-disabled testers who have learned to use assistive technology are not substitutes for people who rely on it daily. Usability testing with screen reader users, switch access users, and users with cognitive disabilities surfaces a qualitatively different set of problems. Even one session per quarter raises the quality bar substantially.

5. Track findings with severity ratings. Use a consistent scale: P0 blocks a core task, P1 causes severe friction, P2 moderate, P3 minor. This turns an audit from a list of complaints into a prioritised remediation backlog.

Run automated checks in CI on every PR to catch regressions before review.

Test keyboard navigation as a distinct pass, step by step through every interactive flow.

Pair at least two screen reader/browser combinations for any screen reader audit.

Document manual test scripts so findings are comparable across releases.

Include users who rely on assistive technology in at least periodic usability sessions.

Don't

Treat a zero-violation automated scan as proof of accessibility conformance.

Rely on a single screen reader or a single operating system for all screen reader testing.

Use colour alone to communicate state, error, or meaning — even when contrast passes.

Skip focus management testing for dynamic content like modals, toasts, and route changes.

Defer manual testing to a final pre-launch audit; integrate it as a continuous activity.

Mapping Tests to WCAG 2.2 Criteria

The table below maps WCAG 2.2 criteria to how coverage is typically achieved. “Auto” means reliably machine-detectable; “Manual” means primarily human judgment; “Both” means automation can flag candidates but human confirmation is required.

Criterion (sample)	Primary method
1.1.1 Non-text Content	Both (presence auto; adequacy manual)
1.3.1 Info and Relationships	Both
1.3.2 Meaningful Sequence	Manual
1.4.1 Use of Colour	Manual
1.4.3 / 1.4.6 Contrast	Auto
2.1.1 Keyboard	Manual
2.1.2 No Keyboard Trap	Manual
2.4.3 Focus Order	Manual
2.4.7 Focus Visible	Both
2.4.11 Focus Not Obscured (new in 2.2)	Manual
2.5.8 Target Size Minimum (new in 2.2)	Both
3.1.1 Language of Page	Auto
3.3.1 Error Identification	Both
3.3.7 Redundant Entry (new in 2.2)	Manual
3.3.8 Accessible Authentication (new in 2.2)	Manual
4.1.2 Name, Role, Value	Both

Tooling Choices in 2026

The tool landscape is mature enough that the choice between axe-core, IBM Equal Access, and Deque’s enterprise Axe DevTools matters less than consistently using whichever you pick. That said, a few practical notes:

axe-core (open source) is the most widely adopted engine. Its rules are well-documented and the false-positive rate is intentionally low.
IBM Equal Access covers some criteria axe-core does not, particularly around ARIA pattern conformance. Running both in CI is low overhead and increases coverage.
WAVE browser extension gives quick visual overlays that are useful during exploratory manual testing. It is not suited for CI but is excellent for on-page triage.
Accessibility Insights for Web (Microsoft, built on axe) adds a guided manual testing module that walks testers through criteria automation cannot cover. It is a good scaffold for teams building their first manual protocol.
Figma accessibility plugins (Stark, A11y Annotation Kit) shift some checks left into design, before any code exists. Checking contrast and documenting focus order in Figma is faster and cheaper than fixing it in production.

Avoid the outdated habit of treating a one-time scan by an external vendor as your accessibility programme. Point-in-time audits expire the moment you ship the next feature. The goal is a testing loop that runs continuously.

Communicating Coverage to Stakeholders

Accessibility testing generates findings, and findings need owners and timelines. A few habits that improve follow-through:

Frame coverage as a percentage, not a binary. “We have automated coverage of roughly 35% of applicable criteria and manual coverage of an additional 50% based on our last audit cycle” is honest and actionable. “We passed our accessibility audit” is not.
Attach conformance statements to releases. A short VPAT or Accessibility Conformance Report (ACR) for each major release documents what was tested, the methodology, and known exceptions. This protects the organisation legally and signals commitment to users.
Make the remediation backlog visible. Accessibility issues buried in a separate spreadsheet get deprioritised. Track them in the same system as other engineering work — with severity, affected user group, and WCAG criterion documented — so they compete on equal footing.