You are here: ArticlesAutomating Screen Readers for Accessibility Testing

Automating Screen Readers for Accessibility Testing

July 10, 2025 by Weston Thayer

Contents

Why automate screen reader testing?
Tools, drivers, and libraries
- W3C AT Driver API
- Guidepup
Understanding by example with Guidepup
Examining ROI
Maximizing ROI by going beyond screen readers
Wrapping up

Automated accessibility testing is an essential component of a successful accessibility strategy. It allows us to create continuous processes that scale, save an enormous amount of manual work, and ultimately deliver a high-quality user experience.

Similarly, screen readers are an important aspect of accessibility testing. Many accessibility training sessions include a demo of the presenter navigating the company’s website with a screen reader such as NVDA. As the accessibility consultancy WebAIM says, it can “be an ‘eye-opening’ experience that takes sighted users out of their normal comfort zone,” introducing teams to an often unconsidered perspective.

Traditional automated accessibility tools haven’t included screen readers — at least not directly. Static analysis, which applies a generic ruleset to web pages or source code, is the most popular tooling category today. Examples include axe-core, WAVE, eslint-plugin-jsx-a11y, axe Linter, and others.

Today, new tools are being developed to enable automating screen readers. How do they work and should you consider adopting them?

Why automate screen reader testing?

Static analysis has a big coverage gap. It can tell you if a page has a malformed heading structure, but it can’t tell you if the page could use more headings, or if the headings are appropriate and useful. It also can’t test tricky ARIA live regions or interact with controls like a screen reader will.

That said, many of the rules used by static analyzers were written with screen readers in mind. For example, axe’s aria-allowed-role rule exists in part because of compatibility issues observed in screen readers at one point in time. But there’s a big difference between ensuring anti-patterns aren’t in use and interactively testing compatibility with a real screen reader.

Manually testing with a screen reader can give you confidence in all of the above, but manual testing doesn’t scale. An automated solution might.

Tools, drivers, and libraries

There are several open source solutions for automating screen readers like NVDA, VoiceOver, and JAWS. At Assistiv Labs, we also work on screen reader automation, powered by our assistive technology cloud, as a part of our end-to-end accessibility testing service.

Next, we’ll cover the two most influential open source projects, from our perspective, and evaluate where they’re useful.

W3C AT Driver API

The w3c/at-driver project emerged from the ARIA-AT W3C Community Group, which aims to achieve “assistive technology interoperability” on the web by creating a test suite that people developing screen readers, browsers, and new web technologies can use to understand if things are working well — for every screen reader and every browser.

Video courtesy of the ARIA-AT W3C Community Group

It’s an ambitious goal which will require automation. The AT Driver draft specification defines an official protocol that screen readers can implement to enable robust, low-level automation.

While a 3rd party NVDA driver implementation exists, we’ll have to wait for the project to mature to see broader support and easy-to-follow documentation.

Guidepup

Craig Morten’s Guidepup project has great documentation and supports NVDA, macOS VoiceOver, and a lightweight screen reader simulator written in JavaScript he calls virtual-screen-reader.

Developers familiar with Node.js will appreciate Jest and Playwright integrations. Leveraging Playwright to automate a browser, typical Guidepup code looks like this:

// Navigate to Guidepup GitHub page
await page.goto("https://github.com/guidepup/guidepup");

// Wait for page to be ready
await page.locator('header[role="banner"]').waitFor();

// Interact with the page
await voiceOver.navigateToWebContent();

// Move across the page menu to the Guidepup heading using VoiceOver
while ((await voiceOver.itemText()) !== "Guidepup heading level 1") {
  await voiceOver.perform(voiceOver.keyboardCommands.findNextHeading);
}

// Assert that the spoken phrases are as expected
expect(JSON.stringify(await voiceOver.spokenPhraseLog())).toBe([
  "Guidepup heading level 1",
  "Screen Reader A11y Workflows",
  "Full Control heading level 3",
  "Mirrors Real User Experience heading level 3",
  "Framework Agnostic heading level 3"
]);

Be aware you can only run VoiceOver tests on macOS and NVDA tests on Windows. Each requires careful environment setup that includes some sensitive steps — be sure to check with your security team before giving it a try. The virtual-screen-reader is setup-free, but comes with disclaimers.

Understanding by example with Guidepup

To build an understanding of what these tools do and how they work, we’ll adopt an example from Guidepup (the AT Driver API uses very similar concepts). At a high level, the automation is performed by supplying a test script — code instructing what the screen reader should do — to a test runner, which automates the screen reader (and potentially browser) to follow the instructions. The Guidepup example below is a test script that does the following:

Loads https://www.guidepup.dev
Reads the page’s 4 headings
Reads the “Get Started and GitHub” call to action links (above the footer)
Invokes the “Get Started” link

If you’d like to run the automation yourself, follow the instructions in this Gist. A video of the example running is below:

Flashing content warning: NVDA visual highlighting is turned on which causes a few red and blue borders to quickly flash. Note: Guidepup’s automated NVDA does not vocalize or speak, its output is only visible via the speech viewer.

Show video description

A screen recording of a Windows PC.

Two app windows are visible. In the background, VS Code has the example in the Gist above open. In the foreground, a "Node.js command prompt" shows an empty prompt.

npx playwright test is typed into the prompt and then executed.

The prompt displays Running 1 test using 1 worker briefly before a Chromium browser window takes up most of the screen.

NVDA speech viewer shows, then NVDA's visual highlight blue and run rectangles start to show. The browser loads guidepup.dev. The initial speech viewer content says:

document
about:blank
Connected to control server
Screen reader driver for test automation | Guidepup document busy blank
Screen reader driver for test automation | Guidepup document
clickable Skip to main content region link Skip to main content
Screen reader driver for test automation | Guidepup - Chromium

Then the headings on the page start to receive focus, in the same order as in the test script below, and the speech viewer shows each heading’s announcement. NVDA’s visual highlight roughly tracks on the page.

Finally, the browser closes and the prompt says 1 passed (26.4s).

Following is the test script, modified and commented for readability:

↓ Skip to after Guidepup example

test("I can navigate the Guidepup Github page", async ({ page, nvda }) => {
  // 1. Loads https://www.guidepup.dev
  await page.goto("https://www.guidepup.dev");
  // this means delay until the h1 is visible
  await page.locator("h1").waitFor();
  await delay(500);
  // Tell NVDA to focus on the browser (instead of elsewhere in the OS)
  // Make sure NVDA is not in focus mode.
  await nvda.perform(nvda.keyboardCommands.exitFocusMode);
  // Ensure application is brought to front and focused.
  await nvda.perform(nvda.keyboardCommands.reportTitle);
  let windowTitle = await nvda.lastSpokenPhrase();
  let applicationSwitchRetryCount = 0;
  while (!windowTitle.includes("Chromium") && applicationSwitchRetryCount < 10) {
    applicationSwitchRetryCount++;
    await nvda.perform({
      keyCode: [WindowsKeyCodes.Escape],
      modifiers: [WindowsModifiers.Alt],
    });
    await nvda.perform(nvda.keyboardCommands.reportTitle);
    windowTitle = await nvda.lastSpokenPhrase();
  }
  // Clear out logs.
  await nvda.clearItemTextLog();
  await nvda.clearSpokenPhraseLog();

  // 2. Read the page's 4 headings
  let headingCount = 0;
  const expectedHeadings = [ // NVDA announcements
    "main landmark, Guidepup, heading, level 1",
    "Reliable Automation For Your Screen Reader A 11y Workflows Through Java Script, heading, level 2",
    "Full Control, heading, level 3",
    "Mirrors Real User Experience, heading, level 3",
    "Framework Agnostic, heading, level 3",
  ];
  while (
    !(await nvda.lastSpokenPhrase()).includes("Framework Agnostic") &&
    headingCount <= 10
  ) {
    await nvda.perform(nvda.keyboardCommands.moveToNextHeading);
    expect(await nvda.lastSpokenPhrase()).toBe(expectedHeadings[headingCount]);
    headingCount++;
  }

  // 3. Reads the "Get Started and GitHub" links (above the footer)
  let nextCount = 0;
  const expectedContent = [
    "Run with Jest, with Playwright, as an independent script, no vendor lock in.",
    "link, Get Started",
    "link, Git Hub",
  ];
  while (
    !(await nvda.lastSpokenPhrase()).includes("Git Hub") &&
    nextCount <= 10
  ) {
    await nvda.next();
    expect(await nvda.lastSpokenPhrase()).toBe(expectedContent[nextCount]);
    nextCount++;
  }

  // 4. Invokes the "Get Started" link
  await nvda.previous();
  await nvda.act();
});

↑ Skip to before Guidepup example

The first thing to notice is this test code was written specifically for the guidepup.dev home page by someone who understands screen readers. It won’t work on another website or even another page within Guidepup’s website.

That’s a foundational difference between screen reader automation and static analyzers, like axe. Static analyzers can evaluate any web page, out of the box. Screen reader automation is written to evaluate specific web pages. There are both strengths and weaknesses to this approach:

	Static analyzers	Screen reader automation
Implementation cost	🟢 Low	🟡 Medium to high
Coverage	🟡 Broad, but often shallow	🟡 Potentially very deep, depends on quality and number of tests written
Interactive	🔴 No, attempts to flag common issues that may block interaction, but does not perform interactions	🟢 Yes, natively performs interactions
Contextual	🔴 No, generic rules can’t understand a specific website’s unique context	🟢 Yes, written for specific websites

To put it in example, an axe-core scan of guidepup.dev wouldn’t give confidence that the page contains 5 meaningful headings. axe-core simply ensures that the page has an h1 and if it does contain additional headings, that they don’t skip levels. Nor would it ensure the meaningful sequence of the call to action links.

When the Guidepup test is run, it provides high confidence that all of those things still exist and function as expected with NVDA.

Examining ROI

The above example can definitely provide some coverage that static analyzers cannot, but at the cost of time and expertise. Someone skilled in screen readers had to…

Evaluate the existing NVDA experience to determine whether it contains any bugs
Learn the test syntax and API
Write the test
Debug the test until it passes

To decide whether the time investment to develop screen reader automation for a website is worth it, it’s helpful to understand how much coverage it provides compared to static analyzers. If screen reader automation greatly expands coverage, it could be well worth it — catching many previously undetected bugs in real time (when they’re cheap to fix!) and providing trustworthy monitoring. If not, there’s a chance it creates more work than it saves.

One approach is to compare how each of the 55 WCAG 2.2 AA success criteria were validated during both the screen reader automation example vs. an axe-core scan of the page.

↓ Skip to after table

WCAG 2.2 SC	Guidepup example	axe-core	Notes
1.1.1 Non-text Content (Level A)	✅ narrow, deep	✅ shallow, broad	Screen reader automation can validate that images and other non-text content are announced with a text alternative. At the time of writing, guidepup.dev doesn’t include any non-text content. But if it updates (for example, if the GitHub link becomes an icon button), the test will ensure it still, almost certainly, has the text alternative of “GitHub”. axe-core evaluates the whole page, not just the headings and CTA links for the following rules involving 1.1.1: aria-meter-name, aria-progressbar-name, image-alt, input-image-alt, object-alt, role-img-alt, svg-img-alt. It can detect if an image is added to the page without an alt, but it cannot ensure the alt is accurate. Note: both tools miss that many links on the page are followed by an “opens in new tab” icon which has no text alternative.
1.2.1 Audio-only and Video-only (Prerecorded) (Level A)	❌	❌
1.2.2 Captions (Prerecorded) (Level A)	❌	✅ shallow, broad	axe-core evaluates the whole page for the following rules: video-caption.
1.2.3 Audio Description or Media Alternative (Prerecorded) (Level A)	❌	❌
1.2.4 Captions (Live) (Level AA)	❌	❌
1.2.5 Audio Description (Prerecorded) (Level AA)	❌	❌
1.3.1 Info and Relationships (Level A)	🤔	✅ shallow, broad	Screen reader automation in this example snapshots the semantic heading structure as announced by NVDA. But it will not detect if the visual page diverges, for example if a visual heading is added with `<div class=”heading-3”>`, which is a crucial aspect of 1.3.1. axe-core evaluates the whole page for the following rules involving 1.3.1: aria-hidden-body, aria-required-children, aria-required-parent, definition-list, dlitem, list, listitem, p-as-heading, table-fake-caption, td-has-header, td-headers-attr, th-has-data-cells. However, it provides no validation of the existing heading structure nor can it detect if a visual heading that lacks semantics is added.
1.3.2 Meaningful Sequence (Level A)	✅ narrow	❌	Screen reader automation in this example snapshots a meaningful heading and link structure programmatically exposed to and announced by NVDA. The test will continue to validate that structure, but does not cover the whole page’s sequence.
1.3.3 Sensory Characteristics (Level A)	❌	❌
1.3.4 Orientation (Level AA)	❌	✅	Screen reader automation in this example does not attempt to change the orientation. Even if it did, a screen reader is probably the wrong way to validate orientation. axe-core is checking for the following rule: css-orientation-lock.
1.3.5 Identify Input Purpose (Level AA)	❌	✅ shallow, broad	The example does not include any inputs. If inputs did exist, screen reader automation may not detect changes to the `autocomplete` attribute, since it primarily controls browser features. axe-core is checking each page for the following rule: autocomplete-valid, which can detect invalid autocomplete values but cannot verify whether the provided value is appropriate for the input.
1.4.1 Use of Color (Level A)	❌	✅ shallow, broad	Screen reader automation does not evaluate visuals. axe-core is checking each page for the following rule: link-in-text-block.
1.4.2 Audio Control (Level A)	❌	✅ shallow, broad	Screen reader automation does not evaluate audio originating outside the screen reader. axe-core is checking each page for the following rules: no-autoplay-audio.
1.4.3 Contrast (Minimum) (Level AA)	❌	✅ moderately deep, broad	Screen reader automation does not evaluate visuals. axe-core is checking each page/state for the following rules: color-contrast.
1.4.4 Resize Text (Level AA)	❌	✅ shallow, broad	Screen reader automation does not evaluate visuals. axe-core is checking each page for the following rules: meta-viewport. Which prevents a common issue but falls far short of full 1.4.4 validation.
1.4.5 Images of Text (Level AA)	❌	❌
1.4.10 Reflow (Level AA)	❌	❌
1.4.11 Non-Text Contrast (Level AA)	❌	❌
1.4.12 Text Spacing (Level AA)	❌	✅ shallow, broad	Screen reader automation does not evaluate visuals. axe-core is checking each page for the following rules: avoid-inline-spacing.
1.4.13 Content on Hover or Focus (Level AA)	❌	❌	Screen reader automation in this example uses NVDA’s virtual cursor for navigation, which does not trigger hover or focus. Even if it did, screen reader automation does not evaluate visuals.
2.1.1 Keyboard (Level A)	❌	❌*	Screen reader automation in this example uses NVDA’s virtual cursor for navigation, which is not the same as keyboard only navigation. *axe-core is checking each page for the following rules: frame-focusable-content, scrollable-region-focusable, server-side-image-map. But this is extremely minimal coverage of 2.1.1.
2.1.2 No Keyboard Trap (Level A)	❌	❌	Screen reader automation in this example uses NVDA’s virtual cursor for navigation, which is not the same as keyboard only navigation.
2.1.4 Character Key Shortcuts (Level A)	❌	❌
2.2.1 Timing Adjustable (Level A)	❌	✅ shallow, broad	axe-core is checking each page for the following rules: meta-refresh. Which prevents a common issue but falls far short of full 2.2.1 coverage.
2.2.2 Pause, Hide, Stop (Level A)	❌	✅ shallow, broad	axe-core is checking each page for the following rules: blink, marquee. This is extremely minimal coverage of 2.2.2, given how rarely those deprecated HTML elements are used today.
2.3.1 Three Flashes or Below Threshold (Level A)	❌	❌
2.4.1 Bypass Blocks (Level A)	❌	✅ shallow, broad	Screen reader automation in this example focuses on a section of the page that does not include bypass mechanisms. axe-core is checking each page for the following rules: bypass.
2.4.2 Page Titled (Level A)	❌	✅ shallow, broad	Screen reader automation in this example does not capture an announcement of page title, although that would be possible to include in the test. axe-core is checking each page for the following rules: document-title.
2.4.3 Focus Order (Level A)	❌	❌	Screen reader automation in this example uses NVDA’s virtual cursor for navigation, which is not the same as keyboard only navigation and thus provides no 2.4.3 coverage.
2.4.4 Link Purpose (In Context) (Level A)	✅ narrow	✅ shallow, broad	Screen reader automation in this example captures 2 links that currently pass 2.4.4. axe-core is checking each page for the following rules: area-alt, link-name.
2.4.5 Multiple Ways (Level AA)	🤔	❌	Screen reader automation could indirectly validate 2.4.5 if multiple tests are written for a website.
2.4.6 Headings and Labels (Level AA)	✅	❌	Screen reader automation in this example ensures most of the page’s headings will continue to be descriptive.
2.4.7 Focus Visible (Level AA)	❌	❌
2.4.11 Focus Not Obscured (Minimum) (Level AA)	❌	❌
2.5.1 Pointer Gestures (Level A)	❌	❌
2.5.2 Pointer Cancellation (Level A)	❌	❌
2.5.3 Label in Name (Level A)	❌	❌*	Screen reader automation can validate accessible name, but does not match that to a visual label. *axe-core has an experimental rule: label-content-name-mismatch
2.5.4 Motion Actuation (Level A)	❌	❌
2.5.7 Dragging Movements (Level AA)	❌	❌
2.5.8 Target Size (Minimum) (Level AA)	❌	✅ shallow, broad	axe-core is checking each page for the following rules: target-size.
3.1.1 Language of Page (Level A)	❌	✅	Screen reader automation in this example does not include language validation. Screen readers do detect programmatic language, but Guidepup does not provide access. axe-core is checking each page for the following rules: html-has-lang, html-lang-valid, html-xml-lang-mismatch. But axe-core cannot validate that the language is correct.
3.1.2 Language of Parts (Level AA)	❌	✅	Screen reader automation in this example does not include language validation. Screen readers do detect programmatic language, but Guidepup does not provide access. axe-core is checking each page for the following rules: valid-lang.
3.2.1 On Focus (Level A)	❌	❌	Screen reader automation in this example uses NVDA’s virtual cursor for navigation, which is not the same as keyboard only navigation and thus provides no 3.2.1 coverage.
3.2.2 On Input (Level A)	🤔	❌	Screen reader automation in this example does not provide any input. If inputs were added, screen reader automation could provide 3.2.2 coverage, but the test author needs to be aware of 3.2.2 and include coverage for it
3.2.3 Consistent Navigation (Level AA)	🤔	❌	Screen reader automation could indirectly validate 3.2.3 if multiple tests are written for a website.
3.2.4 Consistent Identification (Level AA)	🤔	❌	Screen reader automation could indirectly validate 3.2.4 if multiple tests are written for a website.
3.2.6 Consistent Help (Level A)	🤔	❌	Screen reader automation could indirectly validate 3.2.6 if multiple tests are written for a website.
3.3.1 Error Identification (Level A)	🤔	❌	Screen reader automation in this example doesn’t involve submitting any data. It could be used to a degree, but the test author needs to be aware of 3.3.1 and include coverage for it.
3.3.2 Labels or Instructions (Level A)	🤔	✅ shallow, broad	Screen reader automation in this example doesn’t involve any user input. If it did, the test author needs to be aware of 3.3.2 and include coverage for it, but the screen reader couldn’t evaluate whether the same information is available visually. axe-core is checking each page for the following rules: form-field-multiple-labels.
3.3.3 Error Suggestion (Level AA)	🤔	❌	Screen reader automation in this example doesn’t involve any user input. If it did, the test author needs to be aware of 3.3.3 and include coverage for it, but the screen reader couldn’t evaluate whether the same information is available visually.
3.3.4 Error Prevention (Legal, Financial, Data) (Level AA)	🤔	❌	Screen reader automation in this example doesn’t involve any user input. If it did, the test author needs to be aware of 3.3.4 and include coverage for it, but the screen reader couldn’t evaluate whether the same information is available visually.
3.3.7 Redundant Entry (Level A)	🤔	❌	Screen reader automation in this example doesn’t involve any user input. If it did, the test author needs to be aware of 3.3.7 and include coverage for it, but the screen reader couldn’t evaluate whether the same information is available visually.
3.3.8 Accessible Authentication (Minimum) (Level AA)	🤔		Screen reader automation in this example doesn’t involve authentication. If it did, the test author needs to be aware of 3.3.8 and include coverage for it, but the screen reader couldn’t evaluate visual aspects.
4.1.2 Name, Role, Value (Level A)	✅ narrow	✅ moderately deep, broad	Screen reader automation in this example only validated 2 UI components, both links. However it’s capable of deep 4.1.2 validation via interactions. axe-core is checking each page for the following rules: area-alt, aria-allowed-attr, aria-braille-equivalent, aria-command-name, aria-conditional-attr, aria-deprecated-role, aria-hidden-body, aria-hidden-focus, aria-input-field-name, aria-prohibited-attr, aria-required-attr, aria-roles, aria-toggle-field-name, aria-tooltip-name, aria-valid-attr-value, aria-valid-attr, button-name, duplicate-id-aria, frame-title-unique, frame-title, input-button-name, input-image-alt, label, link-name, nested-interactive, select-name, summary-name.
4.1.3 Status Messages (Level AA)	✅	❌	Screen reader automation in this example doesn’t include status messages, but could detect if one was added in error.

↑ Skip to before table

The Guidepup example is clearly providing coverage of 6/55 success criteria. It could potentially expand to include 7 more, depending on whether the test author will react appropriately to page updates that might bring success criteria like 3.3.2 Labels or Instructions into scope.

An axe-core scan of the page could provide some coverage of up to 20 or so success criteria, depending on your level of trust in the lower coverage and experimental rules.

There is nuance behind each of these scores, but it helps build some intuition. In isolation, screen reader automation provides fairly narrow WCAG coverage — which perhaps isn’t surprising, experts have been warning for years that accessibility is more than screen readers. It can deliver much deeper coverage than axe-core in some areas.

Where does this leave us? Is screen reader automation worth it?

Maximizing ROI by going beyond screen readers

At Assistiv Labs, we love the progress being made in screen reader automation and the promise of better interoperability. We also worry that the coverage from screen reader automation in isolation isn’t high enough to warrant investing in for many teams testing their own websites, given how — unlike static analyzers — people must spend time authoring tests.

Could higher coverage change that calculus?

Screen reader testing isn’t the only type of important manual testing that static analysis misses. There’s also at least:

Keyboard testing
Pointer (mouse, touch) testing
Visual contrast testing
Visual reflow/zoom testing
Time-based media (audio, video) testing

What if we automate these as well?

Assistiv robot with lines to symbolize controlling NVDA, Chrome, axe, machine vision, mouse, and keyboard.

With the platform that powers Assistiv Labs’ End-to-End Accessibility Testing, we’re exploring how to efficiently automate as many types of testing as possible to maximize coverage. While our platform is currently in a private beta and can’t currently automate all of the above areas, we’ve made significant progress.

It’s capable of automating screen readers, keyboard, and mouse. It uses machine vision and browser APIs to evaluate visual aspects like contrast. And it makes full use of axe-core as a foundation.

Below is a video of how a multi-modal run of the guidepup.dev example works with Assistiv’s test framework:

Show video description

A grid of 6 browser windows, each overlayed by an icon.

Top left: mouse
Top middle: keyboard
Top right: NVDA
Bottom left: axe-core
Bottom middle: tree icon (representing accessibility tree snapshots)
Bottom right: progress indicator

As the video plays, all 6 browsers show different actions in parallel.

The mouse browser shows a mouse interacting with guidepup.dev, scrolling and clicking the "Get Started" link.

The keyboard browser shows focus moving throughout all links and buttons on the page as the Tab key is pressed. It reaches the bottom of the page then returns to the "Get Started" link and invokes it with Enter.

The NVDA browser shows NVDA's visual focus highlighting and the speech viewer, whose text is too small to be readable. NVDA navigates the page similarly to keyboard.

The axe-core browser has the browser DevTools open with the axe extension installed. It runs a scan on guidepup.dev and then another scan on guidepup.dev/docs/intro.

The tree icon browser has DevTools open to the accessibility tree view. It highlights nodes in the accessibility tree as the appear visually in the browser, moving top-to-bottom on the page.

Finally, the progress indicator browser is a loading spinner that updates at the very end to:

✅ Mouse
✅ Keyboard
✅ NVDA
✅ axe-core
✅ accessibility tree snapshot

Below is an Assistiv Labs test script for the same example page:

↓ Skip to after code

await assistiv.goto("https://www.guidepup.dev/", {
 title: "Screen reader driver for test automation | Guidepup", // 2.4.2 Page Titled
});

await assistiv.scan(); // axe-core with all rules enabled

// Testing the visible and accessible page content for 1.3.1, 1.3.2, etc
await expect(assistiv.locator("body")).toMatchA11ySnapshot([
 {
   role: "region",
   name: "Skip to main content",
   children: [{ role: "link", name: "Skip to main content" }],
   visuallyHidden: true,
 },
 {
   role: "navigation",
   // Capture an issue, this role=navigation has the accessible name "Main"
   // but it is not role=main
   name: new KnownIssue({
     actual: "Main",
     expected: "",
     link: "TODO open issue",
   }),
   children: [
     { role: "link", name: "Guidepup" },
     { role: "link", name: "Docs" },
     { role: "link", name: "API" },
     { ...KnownIssueOpensInNewTabIcon("GitHub") },
   ],
 },
 {
   role: "main",
   children: [
     { role: "heading", name: "Guidepup", level: 1 },
     {
       role: "paragraph",
       children: [
         { role: "#text", name: /./ },
         {
           role: "generic",
           children: [{ role: "#text", name: "🦮" }],
           ariaHidden: true,
         },
       ],
     },
     { role: "link", name: "Get Started" },
     { role: "heading", name: /Reliable/, level: 2 },
     { role: "paragraph", name: /./ },
     { role: "heading", name: "Full control", level: 3 },
     { role: "paragraph", name: /./ },
     { role: "heading", name: /Mirrors/, level: 3 },
     { role: "paragraph", name: /./ },
     { role: "heading", name: "Framework agnostic", level: 3 },
     { role: "paragraph", name: /./ },
     { role: "link", name: "Get Started" },
     { role: "link", name: "GitHub" },
   ],
 },
 {
   role: "contentinfo",
   children: [
     { role: "#text", name: "Docs" },
     {
       role: "list",
       childTemplate: {
         role: "listitem",
         children: [{ role: "link", name: /./ }],
       },
     },
     { role: "#text", name: "Community" },
     {
       role: "list",
       childTemplate: {
         role: "listitem",
         children: [{ ...KnownIssueOpensInNewTabIcon(/./) }],
       },
     },
     { role: "#text", name: "GitHub" },
     {
       role: "list",
       childTemplate: {
         role: "listitem",
         children: [{ ...KnownIssueOpensInNewTabIcon(/./) }],
       },
     },
     { role: "#text", name: /Copyright/ },
   ],
 },
]);
// Tracking a 1.3.1 failure, link text is often followed by an icon to indicate
// whether the link will open a new tab, but this icon is hidden, a screen
// reader user will not benefit from the visual information
const KnownIssueOpensInNewTabIcon = (name) => new KnownIssue({
 actual: {
   role: "link",
   name,
   children: [
     { role: "#text", name },
     {
       role: "img", // opens in new tab icon
       ariaHidden: true,
     },
   ],
 },
 expected: {
   role: "link",
   name: `${name} (opens in new tab)`,
   children: [
     { role: "#text", name },
     {
       role: "img", // opens in new tab icon
       name: "(opens in new tab)",
     },
   ],
 },
 link: "TODO open issue",
});

// Ensure entire page can be tabbed through for 2.1.1, 2.1.2.
await assistiv.getByRole("link").last().focus({
 key: "Tab",
 focusChecks: "require-visible-focus", // automatic 2.4.7 validation
});

// Navigate to the 2nd "Get Started" link
const GetStartedLink = assistiv.getByRole("link", { name: "Get Started" }).last();
await GetStartedLink.focus({ key: "Shift+Tab" });
await GetStartedLink.invoke(); // Mouse click, or keyboard Enter, or via screen reader virtual cursor

// Ensure focus and new page load successfully
await expect(assistiv.locator("body")).toBeFocused({
 // 4.1.3 failure. This is a single page app (SPA) performing a client side
 // route change/navigation. While focus is reset to body, NVDA does not
 // announce anything to indicate when the new page is ready
 speech: new KnownIssue({
   actual: "", // silence
   expected: "Getting Started | Guidepup", // page title
   link: "TODO open issue",
 }),
});

↑ Skip to before code

The test detects multiple accessibility issues that previously slipped through:

Links to external sites are indicated by an icon which is available visually, but not programmatically/accessibly (1.3.1 failure)
The navigation region has an accessible name of “Main” and is followed by the main region, which could be confusing (best practice)
Invoking the “Get Started” link doesn’t cause NVDA to announce the new page (4.1.3 failure)

These issues are embedded into the test via KnownIssue to ensure they still reproduce and to detect fixes in real time.

The Assistiv Labs test framework also provides guidance to human reviewers on how to respond when things change. For example, if a new image containing text is detected, the framework helps the reviewer understand if it’s a 1.4.5 failure and what to do in response.

This approach changes WCAG coverage substantially.

↓ Skip to after table

WCAG 2.2 SC	Assistiv E2E	Notes
1.1.1 Non-text Content (Level A)	✅	All non-text content is detected and issues are tracked. If more non-text content is added in the future, it will be flagged for human review so the presence of purposeful alternative text can be evaluated.
1.2.1 Audio-only and Video-only (Prerecorded) (Level A)	✅ partial	All audio and video is detected (if present) and presence of alternatives are validated, although accuracy is not.
1.2.2 Captions (Prerecorded) (Level A)	✅ partial	All audio and video with audio is detected (if present) and presence of captions is validated, although accuracy of captions is not.
1.2.3 Audio Description or Media Alternative (Prerecorded) (Level A)	✅ partial	All video is detected (if present) and presence of audio description or other alternative is validated, although its accuracy is not.
1.2.4 Captions (Live) (Level AA)	✅ partial	All video is detected (if present) and presence of captions is validated, although accuracy is not.
1.2.5 Audio Description (Prerecorded) (Level AA)	✅ partial	All video is detected (if present) and presence of audio description is validated, although its accuracy is not.
1.3.1 Info and Relationships (Level A)	✅	Page’s visual and programmatically accessible content are both captured and evaluated to ensure they always convey equivalent information. As content changes, it will be flagged for human review.
1.3.2 Meaningful Sequence (Level A)	✅	Page’s programmatic reading sequence is captured and evaluated to ensure it is always correct. As content changes, it will be flagged for human review.
1.3.3 Sensory Characteristics (Level A)	❌
1.3.4 Orientation (Level AA)	✅ shallow	Equivalent to axe-core checking for the following rule: css-orientation-lock.
1.3.5 Identify Input Purpose (Level AA)	✅	The label and type of inputs (if present) are captured and evaluated to ensure they don’t unexpectedly change. As inputs change, they will be flagged for human review.
1.4.1 Use of Color (Level A)	✅ shallow	Equivalent to axe-core checking each page for the following rule: link-in-text-block.
1.4.2 Audio Control (Level A)	✅ shallow	Equivalent to axe-core checking each page for the following rules: no-autoplay-audio.
1.4.3 Contrast (Minimum) (Level AA)	✅ moderately deep	Equivalent to axe-core checking each page for the following rules: color-contrast.
1.4.4 Resize Text (Level AA)	✅ shallow	Equivalent to axe-core checking each page for the following rules: meta-viewport. Which prevents a common issue but falls far short of full 1.4.4 validation.
1.4.5 Images of Text (Level AA)	✅	Images of text are detected (if present) and tracked if they cannot be converted to text. As images are added, they will be flagged for human review.
1.4.10 Reflow (Level AA)	❌
1.4.11 Non-Text Contrast (Level AA)	✅ shallow	Focus states are validated for contrast, which falls far short of full 1.4.11 validation.
1.4.12 Text Spacing (Level AA)	✅ shallow	Equivalent to axe-core checking each page for the following rules: avoid-inline-spacing.
1.4.13 Content on Hover or Focus (Level AA)	❌
2.1.1 Keyboard (Level A)	✅	Keyboard only interactions performed.
2.1.2 No Keyboard Trap (Level A)	✅	Ensures keyboard can complete test without hitting a trap.
2.1.4 Character Key Shortcuts (Level A)	❌
2.2.1 Timing Adjustable (Level A)	✅ shallow	Equivalent to axe-core checking each page for the following rules: meta-refresh. Which prevents a common issue but falls far short of full 2.2.1 coverage.
2.2.2 Pause, Hide, Stop (Level A)	❌
2.3.1 Three Flashes or Below Threshold (Level A)	❌
2.4.1 Bypass Blocks (Level A)	✅ partial	Skip links and regions are captured. Skip link not interacted with.
2.4.2 Page Titled (Level A)	✅	Descriptive page title is validated.
2.4.3 Focus Order (Level A)	✅ partial	Focus order is validated at major transitions. Incorrect tab order may not be detected in some cases.
2.4.4 Link Purpose (In Context) (Level A)	✅	Links on the page are captured and validated with their surrounding visual and programmatic context. As links change, they will be flagged for human review.
2.4.5 Multiple Ways (Level AA)	🤔	Could indirectly validate 2.4.5 if multiple tests are written for a website.
2.4.6 Headings and Labels (Level AA)	✅	Headings and labels on the page are captured and validated for descriptiveness.
2.4.7 Focus Visible (Level AA)	✅	Keyboard focus is validated for visibility.
2.4.11 Focus Not Obscured (Minimum) (Level AA)	✅	Keyboard focus is validated for visibility.
2.5.1 Pointer Gestures (Level A)	✅	Ensures single pointer interactions can complete test.
2.5.2 Pointer Cancellation (Level A)	❌
2.5.3 Label in Name (Level A)	✅	Visual label and programmatic names of UI components are captured and validated.
2.5.4 Motion Actuation (Level A)	✅	Ensures test can be completed without motion.
2.5.7 Dragging Movements (Level AA)	✅	Ensures single pointer interactions can complete test.
2.5.8 Target Size (Minimum) (Level AA)	✅	Equivalent to axe-core checking each page for the following rules: target-size.
3.1.1 Language of Page (Level A)	✅ shallow	Equivalent to axe-core checking each page for the following rules: html-has-lang, html-lang-valid, html-xml-lang-mismatch. But axe-core cannot validate that the language is correct.
3.1.2 Language of Parts (Level AA)	✅ shallow	Equivalent to axe-core checking each page for the following rules: valid-lang. But axe-core cannot validate that the language is correct.
3.2.1 On Focus (Level A)	✅	Keyboard focus is validated for no unexpected changes.
3.2.2 On Input (Level A)	✅	Ensures no unexpected changes automatically occur when interacting with UI components. As UI components change, they are flagged for human review and optionally to be written into the test interactively.
3.2.3 Consistent Navigation (Level AA)	🤔	Screen reader automation could indirectly validate 3.2.3 if multiple tests are written for a website.
3.2.4 Consistent Identification (Level AA)	🤔	Screen reader automation could indirectly validate 3.2.4 if multiple tests are written for a website.
3.2.6 Consistent Help (Level A)	🤔	Screen reader automation could indirectly validate 3.2.6 if multiple tests are written for a website.
3.3.1 Error Identification (Level A)	✅	Ensures no errors occur. As UI changes in ways that may introduce errors, they are flagged for human review and to be written into the test interactively.
3.3.2 Labels or Instructions (Level A)	✅	Labels and instructions (if present) are captured and evaluated. As labels and instructions change, they are flagged for human review.
3.3.3 Error Suggestion (Level AA)	✅	Ensures no errors occur. As UI changes in ways that may introduce errors, they are flagged for human review and to be written into the test interactively.
3.3.4 Error Prevention (Legal, Financial, Data) (Level AA)	✅	Ensures no errors occur. As UI changes in ways that may introduce errors, they are flagged for human review and to be written into the test interactively in a way that validates 3.3.4’s unique requirements.
3.3.7 Redundant Entry (Level A)	✅	Inputs (if present) are captured and redundant inputs are tracked. As inputs change, they are flagged for human review.
3.3.8 Accessible Authentication (Minimum) (Level AA)	✅	Authentication (if present) is tested to ensure no cognitive function tests are required. If authentication is added, it’s flagged for human review and to be written into the test interactively.
4.1.2 Name, Role, Value (Level A)	✅	UI component name, role, states, properties, and values are captured. Key UI components are interactively validated with NVDA.
4.1.3 Status Messages (Level AA)	✅	Status messages (if present) are validated with NVDA.

↑ Skip to before table

44/55 success criteria (80%) receive validation. Again, there is nuance in each, but this process helps build intuition — by automating screen readers, keyboard, mouse, and visual checks, you can reach a very high level of accessibility coverage.

So does this change the calculus?

In our experience, yes. This automation approach frequently detects bugs that were otherwise slipping through the cracks. Bugs in focus management, accessible names, semantic structure, live regions, and much more. When automation can continuously detect these issues, they can be fixed much more efficiently, saving hundreds of hours for developers, project managers, and QA.

Wrapping up

Traditional static analysis tools set a solid floor for automated accessibility testing, but the evolving landscape of screen reader automation — exemplified by projects like W3C AT Driver API and Guidepup — represents a crucial step forward in addressing their inherent coverage gaps.

Automating screen readers allows for deeper, more interactive, highly targeted testing, revealing issues that static checks miss, particularly concerning user experience with assistive technologies. However, the true potential for scaling accessibility testing and maximizing ROI lies in taking a comprehensive approach.

By coupling automated screen reader testing with other critical checks such as keyboard, pointer, and visual evaluations, Assistiv Labs’ end-to-end accessibility testing provides a complete solution. Our holistic strategy significantly expands WCAG coverage, enabling continuous detection of a wider range of accessibility bugs and ultimately saving substantial time and resources for development teams.

Get in touch if you’re interested in learning more.