Level Access

Author: Level Access

Captioning plays an essential role in content accessibility. Many people who are Deaf or Hard-of-Hearing (HoH), as well as those with some types of cognitive and auditory processing disabilities, use closed captions to understand audiovisual material. And with the rise of Artificial Intelligence (AI)-powered automated captioning tools—called AI captioning—the practice of adding captioning to audiovisual content is growing even more widespread.

However, AI captioning has important limitations, and if your organization is trying to comply with accessibility laws, this approach may not be sufficient. Read more to learn about the different types of captioning and the legal requirements for captions. You’ll also find a quick checklist for captioning best practices.

What are the different types of video captions?

While you may be familiar with subtitles or closed captions, there are actually several different types of captioning, and each has its own benefits:

  • AI captions: AI captions are computer-generated, or automated, captions that capture the dialogue in multimedia content. AI captioning tools make adding captions to videos easier for those publishing content, but without human supervision, they are often inaccurate and incomplete.
  • Closed captions: This type of captioning is often started with software but requires human oversight to complete. Closed captions capture dialogue as well as important sounds, sound effects, or music within media, all of which have to be added by human captioners. These captions are both more accurate and more complete than AI captions, and because they require human modification, they are the only type of automated captions that meet Web Content Accessibility Guidelines (WCAG) standards.

It’s worth noting that, in some countries outside of the U.S., the term “closed captions” is not used. For example, in the U.K., the term “subtitles for the Deaf and Hard-of-Hearing (SDH)” is used instead.

  • Real-time captions: Another term for Computer Assisted Real-time Translation (CART), real-time captions are human-generated captions produced at the same time audio is accessed. These are most commonly provided during live speech as opposed to within media recordings.

How can I tell the difference between types of captions?

Captions can follow a variety of visual formats, but they’re most frequently rendered in white or yellow text on a black background. Though captions sometimes appear without a background, it’s important to ensure that there’s sufficient color contrast between caption text and the content behind it for accessibility.

To tell the difference between types of captions, review the captions to understand if they include descriptions of surrounding audio. This information is typically provided in brackets: for example, [applause], [laughter], or [music plays] (music can also be shown by a music note symbol or italics). If these descriptions are present, you may be accessing closed captions.

You can also review the captions for accuracy: a regular pattern of errors or incomplete speech generally indicates the captions are generated by AI. Real-time captioning, in contrast, is usually displayed in a private transcript for a specific user. If it is displayed with a video, it will most often appear in all capital letters, which helps the captioners output captions at the speed of speech.

When are captions required?

As an accessibility best practice, all video or audio recordings should include closed captions. In fact, the U.S. Federal Communications Commission (FCC) has required that broadcasting companies, cable companies, and satellite services provide closed captioning for all new, non-exempt English language programming since 2006, and for all online videos since 2012.

Additionally, many U.S. and international laws, including the Americans with Disabilities Act (ADA), Section 508 of the Rehabilitation Act of 1973, and the European Accessibility Act (EAA) mandate the use of closed captioning in publicly available audiovisual content.

To comply with these laws, organizations must satisfy the criteria for captioning outlined in the Web Content Accessibility Guidelines (WCAG). Recently, Title II of the Americans with Disabilities Act (ADA) was updated to include WCAG 2.1 AA as the standard for digital accessibility. Section 508 cites WCAG 2.0 AA as the compliance standard. And the presumptive EAA compliance standard, EN 301 549, incorporates WCAG 2.1 AA.

These laws impact a wide range of organizations across the public and private sectors. While Title II of the ADA applies to state and local governments and affects their vendors, Section 508 applies to federal government agencies, organizations that receive federal funding, and vendors who do business with these organizations. The EAA covers a wide range of products and services circulating in the European Union (EU) and applies to most companies with EU-based customers.

What are best practices for captioning?

WCAG 2.1 A (and WCAG 2.0 A as the preceding standard) requires that captions be provided for all pre-recorded audio content, with captioning defined as visual and / or text alternatives for both speech and non-speech audio information. Including auditory information outside of speech is critical for making closed captioning conformant with WCAG. Because AI captioning only captures dialogue, it does not meet WCAG criteria without human intervention.

Checking the following four items will help you confirm that your captioning conforms with WCAG and complies with regulations—regardless of the method you use to generate those captions:

  • Accurate: Spoken words and non-speech sounds should match and should be captioned accurately.
  • Synchronous: Captions must be in sync with the audio of the program. Text should coincide with corresponding spoken words and sounds and be displayed on screen at a speed that users can read.
  • Complete: Captions must be present throughout the entirety of a program.
  • Properly placed: Closed captions should not block any important on-screen visuals, overlap with one another, or run off the edge of the screen.

Closed captions are just one piece of the puzzle when it comes to digital accessibility. At Level Access, we offer the expertise, training, testing, and legal expertise to support you with all facets of digital accessibility—helping you achieve WCAG conformance and meet your compliance goals. Contact us to learn more about our solution.

Frequently asked questions

Why do content providers need to add captioning to media? 

Content providers need to generate and provide their own captioning for multimedia, even though automated captions are a common user-enabled option in many media delivery platforms. In most cases, AI-generated captions do not meet WCAG success criteria due to being inaccurate, incomplete, or lacking information about non-speech audio content like music or sound effects. Because of this, it is important for organizations to implement captioning themselves—and modify any captioning provided by software so that it contains all the auditory information required for closed captions.

While the word “subtitles” is often used interchangeably with “closed captions,” subtitles actually aren’t a type of captioning at all. Subtitles refer to language translations of audio, and are designed to help those who can hear but don’t speak a specific language to understand dialogue.

Many users of closed captions struggle when captions are included but interfere with other information included in multimedia. For example, closed captioning in presentations may overlap with on-screen text, making it difficult to access both the captions and the information provided. This would also cause the captions to be non-conformant with WCAG based on placement (and possibly contrast). Content creators should think about providing dedicated space for captions within their media, usually at the bottom of the screen.

Even though AI captioning has flaws, it is still a useful tool to begin captioning. The key is to remember that AI captioning needs human assistance to create closed captions. Content providers should read over all captioning to ensure it is accurate and complete, after which they can add in descriptions of non-speech audio that bring automated captions up to WCAG standards.