If you try to use a Windows screen reader on the web for the first time, you might find the experience to be daunting and confusing. This is because Windows screen readers introduce new access paradigms that do not always match what is displayed visually. Windows screen readers offer several modes to allow a user to review and interact with web content. Understanding how screen readers work and successfully using a screen reader on the web requires the user to be able to determine which mode is currently active, the operation paradigm for each mode, and how to switch modes as required. Developers also need to be cognizant of the screen reader modes used for accessing different types of content and the effect that their code will have on the user experience if a particular mode is used.

How do screen readers work in different modes?

Document Mode

The most common mode used to access web pages using Windows screen readers will be referred to here as “document” mode. This is also often called “virtual” or “browse” mode, used as proprietary terminology by specific screen readers. This is the default mode that is invoked when a page loads in the browser. This mode may be overridden by web pages that auto-focus a form field or apply certain WAI-ARIA roles.

How do screen readers work in document mode? In document mode, the user is interacting with a flushed copy of the web page. The screen reader stores a copy of the page and allows the user to traverse it as if it were a word processing document. This is accomplished by providing the user with an invisible cursor that is free to roam the document by the same units of text that would be available in a word processing application: characters, words, lines, paragraphs, etc. JAWS refers to this cursor as the “virtual cursor.” As the JAWS virtual cursor moves around on a web page, no visible indicator is shown as to its location. This allows a user to move the virtual cursor outside the range of the content currently being displayed on-screen. Window-Eyes does show a visual indicator of its invisible document navigation cursor. Document mode usually provides commands for selecting text by these same units for manipulation as well.

There are several advantages to document mode access. The primary advantage is that the invisible cursor can move to content that is not keyboard-focusable. This allows for review of content contained in between elements that receive keyboard focus. As keyboard focus is usually only provided to operable controls such as hyperlinks and form fields, text in between these elements can easily be accessed while document mode is active.

In addition to allowing users to navigate by units of text, most modern screen readers provide keystrokes within document mode to navigate by web page elements. These are frequently called “navigation quick keys.” For example, headings are a common way for users to non visually scan web pages and identify the different sections of the page. Screen readers usually provide a mechanism to navigate to the next subsequent heading element on a page, and also to navigate to a designated heading level. The letter “H” is commonly used to jump to the next subsequent heading, and shift+H navigates in reverse. The numbered keystrokes are commonly used to navigate by a specific heading level (e.g. the number 1 on the keyboard for all heading level 1 (h1) elements).

Other common element types that screen readers allow users to jump to include visited and unvisited links, lists, tables, blockquotes, images (graphics), frames, WAI-ARIA landmarks, form fields, and specific types of form controls (edit boxes, buttons, checkboxes, radio buttons, drop-downs/combo boxes, etc.). Table navigation commands are usually provided in document mode as well. These commands allow the user to navigate between cells within a table and traverse the table by row or column. This is usually the mode where the announcement of the relationship between table headers and data cells by the screen reader occurs as well. Finally, many screen readers provide additional features in document mode to locate text such as searching for text strings, jumping to a specific line number or setting invisible place markers on the page.

When document mode is active, the screen reader intercepts any keystrokes that it uses for “navigation quick keys” and does not pass these onto the web page. For example, if a webmail application uses the keys “I” and “J” to navigate to the next and previous message respectively, the screen reader will intercept these keystrokes and interpret them as navigation quick keys. Similarly, the cursor arrow keys get intercepted as well and repurposed to move the document cursor around on the web page. This poses a problem when keyboard access is being provided by a website using keystrokes that get intercepted by the screen reader with document mode active if the user would normally access the site using this mode.

Another area where document mode can pose challenges is with respect to portions of a web page that update dynamically without initiating a server-side refresh. When a page updates dynamically and the screen reader is unaware of the update, the flushed version of the page the user is interacting with can become “stale,” rendering it out of sync with the version being rendered by the browser. This is the reason that client-side content changes can create problems for users of screen readers. Fortunately, there are techniques to alert screen readers that a content change has occurred on the page. One of the most common techniques involves setting focus to the new or changed content on the page. Another technique involves specifying a WAI-ARIA live region on the page to instruct the screen reader to monitor the region for content changes.

Application Mode

When users need to interact with web pages such as entering text into a form field, document mode must be disabled and the screen reader must be switched into “application mode.” This mode also goes by the proprietary screen reader terms of “Forms” and “Focus” mode. Application mode is necessary to interact with forms, dialogs, and web applications.

How do screen readers work in application mode? In application mode, all of the keystrokes which would normally manipulate the invisible document cursor are instead passed through to the web page. This allows a user to enter text into a form field or use the arrow keys to traverse the options in a drop-down. Application mode is usually invoked by placing focus on a form field and pressing a keystroke such as the Enter key. Some screen readers will automatically switch into application mode when a form field is encountered. Application mode can also be invoked by web page authors through the application of certain WAI-ARIA roles such as role="dialog" or role="application" to page elements.

Because the screen reader passes keystrokes through to the web page in application mode, users will only be able to review content that can receive keyboard focus while this mode is active. This is one reason why every form field needs to be associated with a label. When application mode is active, users navigate between keyboard-focusable elements using the tab key as there is no invisible cursor available to review the surrounding text. This becomes even more crucial when authors invoke application mode by applying WAI-ARIA roles such as role="application" or role="dialog." When these WAI-ARIA roles are used, some screen readers will switch into application mode and partially or completely disable the user’s ability to switch back to document mode. The ability of the user to revert to document mode depends on the screen reader, version, and location where the role is applied in the document hierarchy.

For example, applying role="application" to the body tag of a web page prevents users of NVDA from being able to enter document mode for any portion of the page, as this instructs NVDA to treat the entire page as a web application. When these roles are used, care must be taken to ensure that the region of the page containing these roles can be understood solely by the labels or instructions associated with keyboard-focusable elements, as users will be unable to review any content that falls outside of these while application mode is active. The corollary to this is that these roles should not be placed on portions of a web page and its descendants that are actually documents containing static, non-focusable content which must be reviewed in order to understand the page.

Testing your web page for screen readers

In testing your web page, it is important to know which mode your users will be accessing your page with and how screen readers will behave in each mode. For example, a user may review a form using document mode to glean an initial understanding of the form’s layout and content, then switch to application mode in order to complete the form. You should be aware of how screen readers work to render the form in both of these modes. Another example might involve the use of the WAI-ARIA aria-labelledby property to label checkboxes in a data table via other columns concatenated together to serve as the label for the check box. A user might use application mode to complete form fields within the table, but they might review the table itself using document mode. This is why it is important to verify that screen readers support any WAI-ARIA roles, states, and properties used on a page via the various modes that would likely be used to access it.